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INTRODUCTION 

1 he fact that actuarial science is fundamentally 
a branch of biology rather than of mathematics is 
overlooked far more generally than ought to be the 
case. Most people, even those of education and wide 
culture, are inclined to look upon an actuary as a 
particularly crabbed, narrow, and intellectually dusty 
kind of mathematician. In reality his subject is one 
of the liveliest in the whole domain of biology, and 
none surpasses it in its practical interest and import- 
ance to mankind. Because, what the actuary is, or 
at least should be, trying always to formulate more 
and more definitely are the laws which determine 
the duration of human life. Why the actuary in fact 
is too often intellectually but little more than a sort 
of glorified computer, is really only the result of a 
defect in the teaching of biology in our colleges and 
universities. It has only lately come to be recognized 
anywhere that a biologist needed a substantial founda- 
tion in mathematics in order successfully to practice 
a biological profession. It is' not too rash a prediction 
to say that presently the time is coming when no 
important actuarial post will be held by a mathe- 
matician who knows little or no biology. The vigor 
and originality of his biological outlook will be valued 
as highly as the rigidity of his mathematical sub- 
structure now is. 



II Introduction. 

The thing which chiefly makes this book by my 
friend Arne Fisher notable, lies, in a broad sense, 
in the fact that it is a highly original and absolutely 
novel essay in general biology. The language is to a 
considerable extent mathematical, to be sure, but the 
subject matter, the mode of logical approach, and the 
significant conclusion — all these are pure biology. 
Unfortunately many biologists will not be able to 
appreciate its significance, or even to read it intel- 
ligently. But this is their loss, and at the same time 
an exposure of the dire poverty of their intellectual 
equipment for dealing with the problems of their 
science. 

There are two broad features of Fisher's work 
which want emphasis. The first is the successful 
construction of a life table from a knowledge of deaths 
alone. That the construction is successful his results 
set forth in this book abundantly demonstrate. To 
have done this is a mathematical and actuarial 
achievement of the first rank. It may fairly be 
regarded as fundamentally the most significant ad- 
vance in actuarial theory since Halley. It opens out 
wonderful possibilities of research on the laws of 
mortality, in directions which have hitherto been 
wholly impossible of attack. The criterion by which 
the significance of a new technique in any branch of 
science is evaluated, is just this of the degree to which 
it opens up new fields of research. By this criterion 
Fisher's work stands in a high and secure position. 

But of vastly more significance considered purely 
as an intellectual achievement is his discovery of 
the fundamental biological law relating the several 
causes of death to each other, which made the tech- 
nical accomplishment possible. More than one accepted 
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text book on vital- statistics has scornfully instructed 
its readers that no good whatever could come from 
any tabulation or study of death ratios; that they must 
be avoided as the pestilence by any statistician who 
would be orthodox. But orthodoxy and discovery are 
as incompatible intellectually as oil and water are 
physically, a cosmic law often overlooked by our 
" safe and sane" scientific gentry. This book is an 
outstanding demonstration that this law is still in 
operation. Fisher has had the temerity to study the 
ratios of deaths from- one cause or group of causes 
to those from another group, or to all causes together, 
and 1 has discovered that there abides a real and 
hitherto unsuspected lawfulness in these ratios. Here 
again his pioneer work opens out alluring vistas to 
the thoughtful biometrican. 

Altogether we of America are to be warmly 
congratulated that this brilliant Danish mathematical 
biologist has chosen to come and live with us. 

Baltimore, November 1921. 

Raymond Pearl. 



AUTHOR'S PREFACE 



1 he classical method of measuring mortality rests 
essentially upon the fundamental principles first 
enunciated by the British astronomer, Halley, in his 
construction of the famous Breslau Life Table. Since 
the time of Halley this method has been so thoroughly 
investigated and has been perfected to such an extent 
that new developments along this line cannot be 
expected. Any improvements on the original principles 
of Halley are after all nothing but refinements in 
graduating methods; and even in this line it appears 
that the limit of further perfection has been reached. 

Halley's method, which is purely empirical in 
scope and principle, rests primarily upon the know- 
ledge of the number of persons exposed to risk at 
various ages and the correlated number of deaths 
among such exposures. In all cases where such 
information is at hand the old and tried method meets 
all requirements to our full satisfaction; and it would 
appear superfluous to try to supplant it with fun- 
damentally different principles. 

In presenting the new method outlined in this 
little book I wish to state most emphatically that it 
has never been my intention to try to supersede the 
conventional methods of constructon of mortality 
tables wherever such methods are applicable. My 
proposed method is only a supplement to the former 
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tools of statisticians and actuaries, and aims to 
utilize numerous statistical materials to which the 
older system of Halley is not applicable. The idea, 
whether it is new or not, meets in reality a very 
frequent need in mortality investigations. It is a well 
known fact that in the determination of certain 
statistical ratios, it is easier to determine the nume- 
rator than the denominator, as for instance in life 
or sickness assurance, where the losses can be 
ascertained with a very close degree of accuracy, 
while the collection of persons exposed to risk at 
various ages is often difficult to obtain. Similar 
remarks hold true in the case of numerous statistical 
summaries of mortuary records as published in most 
government reports on vital statistics. The desire to 
utilize this enormous statistical material was what 
led me to try the proposed method. 

In principle the plan is fundamentally different 
from that of the empirical method of Halley, inasmuch 
as I have attempted to substitute the inductive 
principle for that of pure empiricism. 

In the first place, I consider the d x curve, or the 
number of deaths by attained ages among the 
survivors of an original cohort of say 1,000,000 
entrants at age 10, as being generated as a compound 
curve of a limited number (say 8 or less) of subsidiary 
component curves of either the Laplacean-Charlier or 
Poissori-Charlier type. 

The method of induction now consists in deter- 
mining the constants or parameters of these sub- 
sidiary curves. These parameters fall into two 
separate categories: — 

A. The statistical characteristics or semi-invari- 
ants which determine the relative frequency distribu- 
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tion by attained age at death, as expressed by the 
mean, the dispersion, the skewness and the excess 
of each subsidiary or component curve. 

B. The areas of each subsidiary or component 
curve. 

The working hypothesis which I have put forward 
is that the relative frequency distribution of deaths by at- 
tained ages, classified according to a limited number of 
groups (generally 8 or less) of causes of death among the 
survivors of the original cohort of entrants, tend to cluster 
around certain ages in such a way that it is possible from 
biological considerations to estimate in practice with a 
sufficiently close degree of approximation the statistical 
characteristics or semi-invariants of the relative frequency 
distributions of the component curves, corresponding to a 
previously chosen classification of causes of death (into 8 
or less subsidiary groups). 

This implies briefly that I suppose it is possible 
from biological considerations to select a priori the 
statistical characteristics of the category as mentioned 
above under A. 

Once this hypothesis is accepted as a true supposi- 
tion, the areas of each of the component curves can 
be determined by purely deductive methods (as for 
instance the method of least squares) from the 
observed values of the proportionate death ratios 

R B (x) (x = 10, 11, 12, 100; B =1, II, III, 

) corresponding to the groups of causes 

of death. 

Thus the parameters as determined in this 
manner exhaust the given statistical material, i.e. 
the observed proportionate death ratios R B (x). A 
mere addition of the subsidiary or component curves 
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gives us then the compound d x curve from which it 
is an easy task to find the functions, i x and q x . 

The scheme as we have briefly outlined it above 
is, therefore, not a cut-and-dried doctrine or a sort 
of "mathematical alchemy" as some of my critics 
have implied. Nor is it an authoritative or infallible 
dogma. The keystone upon which its success depends 
is merely a working hypothesis; i.e. a temporary or 
preliminary supposition. I suppose something to be 
true and try to ascertain whether, in the light of that 
supposed truth, certain facts fit together better than 
they do with any other supposition hitherto tried. 

The validity of the working hypothesis must, in 
my opinion, be proved or disproved either by- 
independent methods and principles of construction 
of mortality tables, such as for instance the empirical 
principle of Halley, hitherto exclusively used by the 
actuaries, or through additional biological studies. l 



1 The biological basis of Mr. Fisher's working hypothesis, which is 
of far greater importance than the purely ancillary mathematical deduc- 
tion, has apparently been overlooked by many of his American critics, 
such as Little, Thompson and Carver. Dr. Carver in the Proceedings 
of the Casualty Actuarial Society of America (Vol. VI, page 357) 
remarks that "if we can construct a table from death alone as in Proc. 
Vol. IV, and by dividing these deaths by q x , determine the unenumer- 
ated population — why not the converse?" 

The answer to this remark is obvious. In the case of mortuary 
records, Fisher considered two different and distinct attributes, namely 
1) the purely quantitative attribute of attained age at death, and 2) the 
purely biological attribute of cause of death, which in conjunction with 
the working hypothesis to a certain extent aims to replace the unknown 
exposures. If we were to follow Dr. Carver's facetious suggestion and, to 
use his phrase, "go the proposed plan one better by using enumerated 
populations only", we should, however, encounter a statistical series with 
the single attribute of attained age only, but no second attribute corres- 
ponding to that of the biological factor of the cause of death. Criticisms 
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In the meantime I feel justified in presenting to 
my readers the practical results obtained by this 
method, which although perhaps not unimpeachable 
in respect to mathematical rigour, neverthelees in my 
opinion offers a means to attack a vast bulk of 
collected statistical data against which our former 
actuarial tools proved useless. The celebrated Russian 
mathematician Tchebycheff, once made a remark to 
the effect that in the antique past the Gods proposed 
certain problems to be solved by man, later on the 
problems were presented by halfgods and great men, 
while now dire necessity fo.rces us to seek some 
solution to numerous practical problems connected 
with our daily conduct. The problem towards which 
I have made an attempt to offer a sort of solution in 
the present little essay is one of these numerous 
problems of dire necessity mentioned by Tchebycheff, 
and I hope that my work along this line, imperfect 
as it is, may nevertheless prove a beginning towards 
more improved methods in the same direction. 

In conclusion I wish to extend my thanks to a 
number of friends and colleagues both in America 
and Europe and Japan who have kept on encouraging 
me in my work along these lines in spite of much 
adverse criticism from certain statistical and actuarial 
circles. I wish in this connection to thank Mr. F. L. 
Hoffman, Statistician of the Prudential Insurance 
Company, for permitting me to apply the method to 
various collections of mortuary records while working 
as a computer in his department. My thanks are also 



of the sort of Dr. Carver's brings to light the fundamentally different 
principles applied by Mr. Fisher in sharp contradistinction to the purely 
empirical methods of the orthodox actuary and statistician. 

Translator. 
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due to Mr. E. A. Vigfusson for. making the trans- 
lation from my rough Danish notes. If the resulting 
English is perhaps open to criticsm, I beg to remind 
the reader that my original manuscript was written 
in Danish and translated into English by an Icelander, 
while the composition and proof reading was done 
by a Copenhagen firm. 

To Professor Glover of the University of Michigan 
I also wish to extend my thanks for inviting me to 
deliver a series of lectures on the construction of 
mortality tables before his classes in actuarial 
methods during the month of March 1919. This 
invitation afforded me the first opportunity to bring 
the proposed method before a professional body of 
statistical readers. 

Last but not least I desire to acknowledge my 
obligations to Professor Pearl whose introductory 
note I consider the strongest part of the book. In 
these departments of knowledge the appreciation of 
one's peers is after all the only real reward one can 
possibly expect. The fact that this eminent biologist 
has recognized that the nucleus of the whole problem 
is of a purely biological nature, and that the 
mathematical analysis is merely ancillary, is 
particularly pleasing to me, because it represents my 
own view in this particular matter. 

p. t. Newark, U. S. A., November 1921. 

Arne Fisher. 



TRANSLATOR'S PREFACE 



During the spring of 1919 the attention of the 
present writer was called to a brief paper entitled 
Note on the Construction of Mortality Tables by means of 
Compound Frequency Curves by the Danish statisticican, 
Mr. Arne Fisher. The novelty and originality of this 
paper impressed me to such an extent that I became 
desirous' of obtaining more detailed information about 
the process than that which necessarily was contained 
in the above summary note, originally printed in the 
Proceedings of the Casualty and Acturial Society of 
America. 

I wrote therefore to Mr. Fisher and inquired 
whether he intended to publish any further studies 
on this 1 subject. From his reply I learned that he had 
delivered a series of lectures on this very topic before 
Professor Glover's insurance classes at the University 
of Michigan during the month of March 1919, but that 
the proposed method had been met with such captious 
opposition in certain actuarial circles that he had 
decided to abandon the plan of publishing anything 
further on the subject and had even destroyed the 
English notes prepared for the Michigan lectures. 

In the meantime the proposed scheme had 
received considerable attention in actuarial circles in 
Europe and Japan and several highly commendatory 
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reviews had appeared in the English and Continental 
insurance periodicals and various scientific journals, 
notably the Journal of the Royal Statistical Society and 
the Bulletin de V Association ales Actuaires Suisses. The 
proposed method seemed indeed so novel and unique 
that I could not help feeling that it deserved a 
better fate than that of being forgotten. I sug- 
gested therefore to Mr. Fisher that he prepare a 
new manuscript. But unfortunately his time did not 
allow this. He consented, however, to turn over to 
rne his original Danish notes on the subject from 
which he had prepared his Michigan lectures and 
permitted me to make an English translation for the 
Scandinavian Insurance Magazine. I gladly availed 
myself of this opportunity to bring this fundamental 
work before an international body of readers and 
started on the translation in the summer of 1919. 

At the same time Mr. Fisher decided to put the 
proposed method and working hypothesis to a very 
severe test, which would meet even the most stringent 
requirements of some of his critics and their conten- 
tion that the method would fail in the case of a 
rapidly changing population group. For this purpose 
he selected a- series of statistical data contained in the 
annual reports and statements of a number of the 
leading Japanese Life Assurance Offices, relating to 
their mortuary records for the four year period from 
1914—1917. More than 35,000 records of male lives, 
arranged according to the Japanese list of causes of 
death and grouped in quinquennial age intervals 
formed the basis for the construction of the final 
life table which was completed in November 1919. 
This table, which like Mr. Fisher's other tables was 
derived without anv information of the number of 
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lives exposed to risk at various ages, is shown in the 
addenda of this treatise. 

Immediately after its construction Mr. Fisher isent 
this table to the well known Japanese actuary, Mr. 
T. Yano, and asked him for an opinion regarding the 
trustworthiness of the final death rates of q x as 
derived by his new method. The Japanese actuary's 
answer arrived in April 1920. Mr. Yano had after 
the receipt of Mr. Fisher's letter ascertained the 
exposures and deaths among male lives at each 
seperate age for about 40 Japanese life offices during 
the period 1914 — 1917 and constructed by means of 
the conventional methods a complete series of q x by 
integral ages from age 10 to 90. These ungraduated 
data are shown as a broken line polygon in the 
appended diagram (Figure 1). In spite of the fact that 
Mr. Fisher had no information whatever about the 
exposed to risk the agreement of the continuous curve 
of q x as determined by the frequency curve method 
with Mr. Yano's ungraduated data is so close that 
I think further comments superfluous. The slight 
differences in younger ages might indeed rise from 
the fact that Mr. Yano had access to all the experience 
(containing more than 45,000 deaths) of all the Ja- 
penese companies, whereas Fisher only used the 
mortuary records as published by some of the leading 
Japanese companies. 

Like all scientific methods of induction Mr. Fi- 
sher's proposed plan rests upon a working hypothesis, 
namely that it is possible from biological considera- 
tions to group the deaths among the survivors at 
various ages in any mortality table according to 
causes in such a maimer that their percentage or 
relative frequency distribution according to attained 
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age at death will conform to a previously selected 
system or family of Laplacean-Charlier or Poisson- 
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Fig. 1. 

Charlier frequency curves. Mr. Fisher himself is very 
frank in ■ stating that this is a working hypothesis 
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upon which hinges the success of the whole method. 
One of the main objections of his critics is that it 
seems impossible to prove the truth of this working 
hypothesis. Naturally its truth cannot be proved by 
mathematics or logic any more that we can prove 
or disprove the existence of Euclidean space, which 
in itself constitutes a working hypothesis for most 
of our applied mathematics. Mr. Fisher's critics might 
as well be asked to prove or disprove Newton's 
hypothetical laws of motion and attraction as 
extended by Maxwell and Hertz, or the newer 
hypothesis recently put forwards by the relativists, 
or the Lorentz hypothesis of contraction. It would 
indeed be a terriffic blow to science and the extension 
af knowledge if it was required that no working 
hypothesis would be alloved in scientific work unless 
such hypothesis could be proved to be true. What 
position would biology occupy to-day if biologists had 
insisted that Darwin's great hypothesis be proved 
before it could be allowed 1 as a foundation in the study 
of evolution? 

The most convincing answer to Mr. Fisher's 
captious critics among the old school of actuaries 
and statisticians is, however, the undisputed fact that 
his working hypothesis as such really does work. 
As pointed out by Dr. Pearl in the introductory note 
of this book the results set forth in the present 
treatise abundantly demonstrate this fact. The 6 
widely different mortality tables as shown in the 
addenda stand as mute and yet as the most eloquent 
evidence to the fact that the method works. It might 
indeed' not appear impertinent to suggest that Mr. 
Fisher's actuarial critics would render a greater 
service to their profession by proving that these six 
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mortality tables cannot be considered as reasonable 
approximator to tables derived by orthodox means 
from the same population groups than by starting 
to poohpooh and ridicule his proposed method. 

Winnipeg, Canada, November 1921. 

E. A. Vigfusson. 



"Nothing is less warranted in science than an uninqui- 
ring and unhoping spirit. In matters of this kind, those 
who despair are almost invariably those who have never 
tried to succeed." 

W. Stanley Jevons. 



CHAPTER I 

(TRANSLATED BY MISS DICKSON) 



AN INTEODUCTION TO THE THEOEY OF 
FEEQUENCY CUEVES 

1. introduction The following method of con- 
structing mortality tables from 
mortuary records by sex, age 
and cause of death rests essentially upon the 
theory of frequency curves originally introduced 
by the great Laplace and of recent years further 
developed and extended through the elegant and 
far reaching researches of the Scandinavian school 
of statisticians under the leadership of Gram, 
Charlier and Thiele and their disciples. This 
method is, however, comparatively little known 
and unfortunately not always fully appreciated 
by the majority of English statisticians and ac- 
tuaries, who prefer to apply the well known 
methods of the eminent English biometrician, 
Karl Pearson. For this reason it may be advisable 
to give a preliminary sketch of Charlier 's methods 
so as to obtain a better understanding of the 

1 



2 Frequency Curves. 

following chapters dealing with the more specific 
problem of mortality tables. The treatment must 
necessarily be brief and represents essentially an 
outline of the more detailed theory which I hope 
to present in my forthcoming second volume of 
the Mathematical Theory of Probabilities. 

By the method of Charlier any frequency 
function is expressed as an infinite series rather 
than as a closed and compact algebraic or tran- 
scendental expression by the Pearsonian methods. 
By power series the thoughts of the majority of 
students are associated with the famous series 
which bear the names of Taylor and Maclaurin. 
In these series the function is derived as an in- 
finite series of ascending powers of the inde- 
pendent variable whose coefficients are expressed 
by means of the correlated successive derivatives 
of the function for specific values of f(x). Thus 
for instance we know that the Maclaurin series 
may be written as follows : 

m = /<o) + g-f (0) + ^/-(O) + . . .~no) + ... 

where /"(0) is the symbol for the value of the n th 
derivative when x = and n = 1, 2, 3, 4 . . . . n. 
There are, however, contrary to the belief of 
many immature students, only comparatively few 
functions which allow a rigorous expansion by this 
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method, in which the derived functions and the 
differential calculus play the leading roles. 

But on the other hand there are other methods 
of expansions in infinite series which are more 
general and by which the coefficients of the in- 
dependent variable are expressed by operations 
other than those of differentiation. One of these 
methods is to express the coefficients as definite 
integrals either of the unknown function itself or 
some auxiliary function. 

The range of practical problems which lay 
themselves open to a successful attack along those 
lines is much wider than the corresponding range 
of practical problems to which we may apply the 
Taylor series. 

Speaking generally as a layman (who continu- 
ously has to face practical rather than abstract 
problems) and specifically as a mathematical 
novice (who considers mathematics as a means 
rather than as an end) this fact appears to me 
quite obvious from a purely philosophical point of 
view. In nature and in all practical observations 
we encounter finite and not infinitesimal quantit- 
ies. In other words, what we actually observe are 
finite sums or definite integrals, i. e. the limit of 
a sum of infinitely small component parts. 

The definite integral rather than the derivative 
and the differential seems, therefore, to be the 
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more elementary and primitive operation and the 
one which suggests itself first hand. History of 
Mathematics indeed proves this contention. Ar- 
chimedes had (as shown by the researches of the 
Danish scholar, Heiberg) laid the essential foun- 
dation for an integral calculus about 500 B. C. 
And nearly 25 centuries later, almost simultane- 
ously with the historical discovery of Heiberg an- 
other Scandinavian, the Swedish mathematician 
and actuary, Fredholm, gave to the world his 
epochmaking work on integral equations. Fred- 
bolm's monumental memoir "Sur une nouvelle 
methode pour la resolution du problems de Dirich- 
let" was first published in the "Ofversigt af aka- 
demiens forhandlinglar" (Stockholm 1900). Mea- 
sured by time the subject of integral equations is 
thus a mere infant in the history of mathematical 
discoveries. Measured by its importance it has 
already become a classic. Its application to a 
steadily increasing number of essentially practical 
problems in almost every branch of science has 
placed it in a central position of modern mathe- 
matical research and it bids fair to become the 
most important branch of mathematics. 

Fredholm in introducing his now famous in- 
finite determinants, known as the Fredholmean 
determinants, had a forerunner in the Danish 
actuary, Gram, whose Doctor's dissertation "Om 
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Rsekkeudviklinger ved de mindste Kvadraters Me- 
tode" (Copenhagen 1879) gave prominence to a 
certain class of functions which later on have 
become known as orthogonal functions, and by 
which Gram actually gave the first expansion of 
a frequency distribution or frequency curve in 
an infinite series. Scandinavians in general and 
Scandinavian actuaries in particular may, there- 
fore, feel proud of their share of imparting know- 
ledge on this important subject, which makes a 
strong bid to place mathematics on a higher plane 
than ever before, not alone as an abstract but 
equally well as an applied science. The genius 
of the Italian renaissance Leonardo da Vinci, as 
early as 1479 proclaimed "that no part of human 
knowledge could lay claim to the title of science 
before it had passed through the stage of mathe- 
matical demonstration". Comparatively few bran- 
ches of learning measure up to the standard of 
Leonardo da Vinci, and our learned friends among 
the economists and sociologists have a long road 
to travel before they succeed in placing their 
methods in the coveted niche of science. But the 
new vistas of possibilities opened up to them by 
means of M. Fredholm's discovery ought to 
furnish them a powerful tool towards the attain- 
ment of the high standard set by the great Italian. 
The principal theorems of integral equations 



6 Frequency Curves. 

are bound to be especially fruitful in their ap- 
plication to mathematical statistics and the pro- 
blems of frequency curves and frequency surfaces 
together with the associated problems of mathe- 
matical correlation. 

2. frequency If N successive observations 

DISTRIBUTIONS originating from the game eg _ 

functions sen tial circumstances or the 
same source of causes are made in respect to a 
certain statistical variate, x, and if the individual 
observations o. (i = l, 2, 3, . . . . N) are permuted 
in an ascending order then this particular per- 
mutation is said to form a frequency distribution 
of x and is denoted by the symbol F(x). 

The relative frequencies of this specific per- 
mutation, that is the ratio which each absolute 
frequency or group of frequencies bear to the 
total number of observations, is called a relative 
frequency function or probability function and is 
denoted by the symbol cp(aO. 

If the statistical variate is continuous or a 
graduated variate, such as heights of soldiers, 
ages at death of assured lives, physical and astro- 
nomical precision measurements, etc., then 

dzcp(z) 
is the probability that the variate x satisfies the 
following relation 



Frequency Functions. 7 

z — -^-dz<x<z + -^dz 

or that x falls between the above limits. 

If the statistical variate assumes integral (dis- 
crete) values only such as the number of alpha 
particles radiated from certain metals and radio- 
active gases as polonium and helium, number of 
fin rays in fishes, or number of petal flowers in 
plants, then cp(z) is the probability that x assumes 
the value z. From the above definitions it follows 
a fortiori that 

(a) F(z) = Nq(z) (Integral variates) 

(b) dz F(z) =N(p(z)dz (Integrated variates) 

Interpreting the above results graphically we 
find that (a) will be represented by a series of 
disconnected or discrete points while (b) will be 
represented by a continuous curve. 

As to the function <p (z) we make for the 
present no other assumptions than those follow- 
ing immediately from the customary definition of 
a mathematical probability. That is to say the 
function 9 (z) must be real and positive. 

Moreover it must, also satisfy the relation 

+ » 

\ cp (z) dz = 1 , 
— 00 

or in the case of discrete variates : 
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'!>(*) = i 

which is but the mathematical way of expressing 
the simple hypothetical disjunctive judgment that 
the variate is sure to assume some one or several 
values in the interval from — go to + oo. The 
zero point is arbitrarily chosen and need not coin- 
cide with the natural zero of the number scale. 
Thus for instance if we in the case of height of 
recruits choose the zero point of the frequency 
curve at 170 centimeters an observation of 180 
centimeters would be recorded as +10 and an 
observation of 160 centimeters as — 10. 

3. property of In regard to a frequency func- 

CONSTANTS OR , • • • 

parameters tion we may assume a prion 
that it will depend only upon 
the variate x and certain mathematical relations 
into which this variate enters with a number of 
constants \, A 2 , A 3 , A 4 , symbolically ex- 
pressed by the notation 

F(x, \, A,, A 3 , A 4 . . . .) 

where the A's are the constants and x the variate. 
All these constants or parameters are naturally 
independent of x and represent some peculiar pro- 
perties or characteristic essentials of the frequency 
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function as expressed in the original observations 

o i (i=l, 2, 3, N). We may, therefore, 

say that each constant or statistical parameter 
entering into the final mathematical form for the 
frequency function is a function of the observa- 
tions o v This fact may be expressed in the follow- 
ing symbolic form : — 

\ = S 1 (o 1? o 2 , 0.,, ... 0^) 



X N = S n(°1> °2,0 a , . . . N ). 

But from purely a priori considerations we 
are able to tell something else about the function 
S . (i=l, 2, 3 .... N). It is only when per- 
muting the various o's in an ascending magnitude 
according to the natural number scale that we 
obtain a frequency function. This arrangement 
itself has, however, no influence upon any one 
of the o's which were generated before this purely 
arbitrary permutation took place. The ultimate 
and previously measured effects of the causes as 
reflected in each individual numerical observa- 
tions, 0., depend only upon the origin of causes 
which form the fundamental basis for the stati- 
stical object under investigation and do not depend 
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upon the order in which the individual o'e occur 
in the series of observations. 

Suppose for instance that the observations 
occurred in the following order 



o lt o 2 , o 3 , o 



X' 



By permuting these elements in their natural or- 
der we obtain the frequency distribution F(x). 
But the very same distribution could have been 
obtained if the observations had occurred in any 
other order as for instance 

o 7 , o 9 , o N , . . . o 3 . . . . o x . 

so long as all of the individual o's were retained 
in the original records. Or to take a concrete ex- 
ample as the study of the number of policyholders 
according to attained ages in a life assurance 
office. We write the age of each individual policy- 
holder on a small card. When all the ages have 
been written on individual cards they may be per- 
muted according to attained age and the resulting 
series is a frequency function of the age x. We 
may now mix these cards just as we mix ordinary 
playing cards in a game of whist, and we get an- 
other permutation — in general different from the 
order in which we originally recorded the ages on 
the cards. But this new permutation can equally 
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well be used to produce the frequency function if 
we are only sure to retain all the cards and do 
not add any new cards. 

4. parameters- The various functions S (o lt 
symmetric o 2 , °3 °jy) are there- 
fore, symmetric functions, that 
is functions which are left unaltered by arbitrarily 
permuting the N elements o, and no interchange 
whatever of the values of the various o's in those 
symmetric functions can have any influence upon 
the final form of the frequency function or fre- 
quency curve, F(x). 

We now introduce under the name of power 
sums a certain well known form of fundamental 
symmetrical functions denned by the following 
relations 



5 


= 0° 


+ 0% 


+ o° 3 + - 


■■o° N 


= N 


s l 


= 0] 


+ o\ 


+ o\+.. 


■ °\ 


=z°\ 


S 2 


= 0\ 


+ o\ 


+ <%+■■ 


o 2 

1 • u s 


= Z°i 


S X 


= f 


+ 0» 


+ of+ ■ 


N 


= Z°! 



Moreover, a well known theorem in elementary 
algebra tells us that every symmetric function 
may be expressed as a function of s lt s 2 , s 3 . . 
. . . s N . 
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From this theorem it follows a fortiori that 
we are able to express the constants A in the fre- 
quency curve as functions of the power sums of 
the observations. While such a procedure is pos- 
sible, theoretically at least, we should, however, 
in most cases find it a very tedious and laborious 
task in actual practice. It, therefore, remains to 
be seen whether it is possible to transform these 
symmetrical functions of the power sums of the 
observations into some other symmetric functions, 
which are more flexible and workable in practical 
computations and which can be expressed in terms 
of the various values of s. 

5. THiELE-s It is the great achievement of 

invariants Thiele to have been the first 
mathematician to realize this 
possibility and make this transformation by intro- 
ducing into the theory of frequency curves a pe- 
culiar system of symmetrical functions which he 
called semi invariants and denoted by the symbols 

^i, \, \ • ■ ■ 

Starting with power sums, s ; . Thiele defines 
these by the following identity 



XjOT X 2 oo 2 X 3 ro 3 

e TL + Hr + ~pr 
which is identical in respect to co 



■^ ^^ =*o+f H-f + S -F + - (1) 
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13 



Since s { =^o i the right hand side of the equa- 
tion may also be written as e i ra + e°* co + eP3 m +...= 

ST 0,-co 

= Z«' ■ 

Differentiating (1) with respect to co we have 



A, a> X,co 2 X,co 3 



* n e 



\1_ |2_ 



■ +... 



A 2 co XgCO 2 



Xi+ TT _ + T 



+ 



s o + jY co +jy co2 +iy M3 +- 



, AnCO Ao „ 



Multiplying out and equating the various 
coefficients of equal powers of co we finally have 

s x = \s 

So = \s x + \ 2 s 

s s = \s 2 + 2 \ 2 s x + X s s 

s i = \ x s 3 + 3A 2 s 2 + 3a 3 s x + X 4 s 



where the coefficients follow the law of the 
binomial theorem. 

Solving for A we have 

\ = s t : s 

X 2 = (s 2 s — sl):sl 

a 3 = ( s 3*o — 3s 2 s 1 s + 2sl):sl 
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x 4 = Si si-4s sSl si — 3*;*; + i2s 2 *;*o — 6s t)=^ 



The semi-invariants X in respect to an ar- 
bitrary origin and unit are as we noted denned 
by the relation 

A,co \,co 2 Xoco 11 

_1 |_ _? L _? L . . . 

11 1 2 1 3 o,a> o,co o,ct> 

s e>- — = e 1 +e 2 +e 3 +... 

where o 1 , o 2 , o 3 . . . are the individual observa- 
tions. 

Let us now change to another coordinate 
system with another unit and origin defined by 
the following linear transformations : — 

o'i = aoi + c (i = 1, 2,3,.. .). 

The semi-invariants in this new system are 
given by the relation 

A' to X' oo 2 X'„a>3 

-A | ? 1 § 1- ... 

1 1 1 2 1 3 • o', ro o'„co o'„a> 

s e — — = e 1 +e J +e 3 + ... = 

(aoj + tOco (ao 2 +c) co 

= e +e + ... 

Since the various values of X' do not depend upon 
the quantity co we may without changing the 
value of the semi-invariants replace co by co : a 
in the above equations, which gives 



Semi-Invariants. 15 



\\ m X'„co 2 X'-co 8 

s e = 

(aoj + c) — (oo 2 + c) — (ao 3 + c) — 
a a a 

e + e + e + . . . = 



a T o,co o„co o,co 

ceo XjCO X 2 co 2 X 3 co 
"a" ~[l + l2~ ¥ ~\* 

= e 5 e 



.] = 



Taking the logarithms on both sides of the equa- 
tion we have 

a^ o«[2_ o 8 [3_ 

CCO XtCO X,C0 2 XotO 3 

~a + |l L + [2_|3_ + 

Differentiating successively with respect to co we 
have 

X' X' to X'.to 3 c , , , , , X3C0 2 

a\l_ a 2 2a 3 a d> 

* + *= + *S? + ...-». + *. + f + ... 

5 + ^ + ...-x. + w.. 
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Letting co = we therefore have 

A, or \\ = aXj + c 



a 


a 


J. X 




= X 2 or X', 


= a 2 X 2 


K 

a 3 


= X 3 or X' 3 


= o»X 8 



from which we deduce the following relations 

Xj (ax + c) = aX x (x) + c 

X r (a#+ c) = a r X r (x) for r > 1, 

which shows how the semi-invariants change by 
introducing a new origin and a new unit. 

We shall for the present leave the semi in- 
variants and only ask the reader to bear in mind 
the above relations between X and s, of which we 
shall later on make use in determining the con- 
stants in the frequency curve cp (x) . 

6. the fourier Before discussing the genera- 

INTEGRALS ,■ £ ,-. , , , » 

tion of the total frequency 
curve it will, however, be nec- 
essary to demonstrate some auxiliary mathema- 
tical formulae from the theory of definite integrals 
and integral equations which will be of use in the 
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following discussion as mathematical tools with 
which to attack the collected statistical data or 
the numerical observations. 

One of these tools is found in the celebrated 
integral theorem by Fourier, which was the first 
integral equation to be successfully treated. We 
shall in the following demonstration adhere to 
the elegant and simple solution by M. Charlier. 
Charlier in his proof supposes that a function, 
F(co) , is defined through the following convergent 
series. 

F(v) = a[/(o) + /(a)e + /(2a)e +... 

+ /(a)e +/(— 2a)e 4-... 



or 



in = <w 

^(oo) = a ^/(cwi)e amtoi (2) 



where / = \ — 1. 

We then see by the well known theorem of 
Cauchy that the integral 
+ x 
/(o9) = < ^f(x)e x ' oi dx (3) 



is finite and convergent. If we now let ma = x 
and let a = as a limiting value, a, becomes 
equal to dx and /(am) = fix). Consequently we 
may write 
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lim F(o) = jT(co). 

a = 

Multiplying (2) by e~ rami da and integrating 
between the limits — n/a and + n/a we get on 
the left an expression of the form 

+ */<* 

{F(a)e- ra<oi dco 

— ?t/a 

and on the right a sum of definite integrals of 
which, however, all but the term containing 
f(ra) as a factor will vanish. This particular term 
reduces to 

a\f(ra)d(o or 2nf(ra). 

— -x/a 

Hence we have 

+ 3t/a 



%*) 



-rami 



f(ra) = ^F(a,)e "*""*». (4a) 

By letting a converge toward zero and by the 
substitution ret = x this equation reduces to 



8»J 



— X03i 



/(*) = izVW* **■ (4b) 
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Charlier has suggested the name conjugated 
Fourier function of f{x) for the expression F (co). 
We then have, if we introduce a new function 
ib (to) defined by the simple relation : 

j/2jr\|>(co) = limF(co) 

a = 

ib (to) = 77 =C/(a:)c* Di dx. (5 a) 



\/2: 

+ 00 

J/2J 



/(*) = i -^=\i|)(a))e- xa,i doo. (5b) 



The equations (5a) and (5b) are known as 
integral equations of the first kind. The eXpreS- 
sion e (or e ) is known as the nucleus of 
the equation. If in (5b) we know the value of 
i]' (co) we are able to determine fix). Inversely, 
if we know f(x) we may find i|> (co) from (5a). 

7 cv^e'asVhe ^ e are now * n a P° s iti° n *° 
a^Yntegral ma ke use OI * ne semi-invariants 
equation f Thiele, which hitherto in 
our discussion have appeared as a rather discon- 
nected and alien member. On page 13 we saw 
that the semi-invariants could be expressed by 
the relation 
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■ CO + ttt CO 2 + - 



Q. | 2 I : 



^3 <i i 

e— = 2^e ■ 

where 0; (i = 1, 2, 3 ) denotes the in- 
dividual observations. 

The definition of the semi-invariants does not 
necessitate that all the o's must be different. If 
some of the o's are exactly alike it is self-evident 

that the term e i must be repeated as often as 
o occurs among all of the observations. If there- 
fore Ny(oi) denotes the absolute frequency of o, 
where cp (o;) is the relative frequency function, 
then the definition of the semi-invariants may be 
written as : — 

V / n Ll Li LL v i \ "i 

For continuous variates, x, the above sums 
are transformed into definite integrals of the form 



■co 2 + -ro 3 +. 



e \ cp(x)aa = \ <p(a:)e 



rfx. 



Let us now substitute the quantity co \ — ] , or 
ica, for co in the above identity. We then have : — 

X l • , X 2 -2 2 - A 3 .3 3 . + <* +""° 



|1_ ' [2_ |_3_ 



\ cp(*)rfx = \ (p(aj)e 1M °da; 
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under the supposition that this transformation 
holds in the complex region in which the func- 
tion is denned. 

In this equation the definite integrals are of 

special importance. The factor \ y(x)dx is, of 

course, equal to unity according to the simple 
considerations set forth on page seven. The in- 
tegral on the right hand side of the equation is, 
however, apart from the constant factor j/2ji 
nothing more than the i|) function in the conjugate 
Fourier function if we let cp(#) = f(x), and 

e {± ^ ^ = l/2^(co). 

According to (5b) we may, therefore write f(x) 
or cp(a;) as 

„ i + { £«>+&**+&**+- -«,. 

cp(*) = ^ Je e An 

as the most general form of the frequency func- 
tion cp (x) expressed by means of semi-invariants. 

8. first approx- The exactness with which 

solutwn 9 0*0 is reproduced depends, 

of course, upon the number of 

A's we decide to consider in the above formula. 

As a first approximation we may omit all X's 
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above the order 2 or all terms in the exponent 
with indices higher than 2. Bearing in mind 
that i 2 = — 1 we therefore have as a first ap- 
proximation 

^ /, tro^! — *)-j2-co 2 

«Po(*)=2^Jc - *»■ 

— CO 

The above definite integral was first evaluated 
by Laplace by means of the following elegant 
analysis. Using the well known Eulerean relation 
for complex quantities the above integral may be 
written as 

+ °° \ 2 a> 2 

\ e cos [(X 1 — x'jcoj cko + 



+ co \2 

. C ~^ ( 



+ I 



sin [(X 1 — :r)co] dco. 



The imaginary member vanishes because the 

factor e is an even function and sin|(X 1 — a;)coj 

an uneven function, the area from — oo to will 
therefore equal the area from to + oo , but be 
opposite in sign, which reduces the total area 
from — oo to + oo or the integral in question to 
zero. 
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In regard to the first term, similar conditions 
hold except that cos [(A 1 — a;) col is an even func- 
tion and the integral may hence be written as 

V-i An 

f -IT CD 2 

I = 2 \ e cos (rco) dm where r = X x — ■?. 



Regarding the parameter r as a variable and dif- 
ferentiating 7 in respect to this variable we have 



dI 2 f ( ^ ~ 




) sin (rco)dco. 


From this we have by partial integration : — 


dl_ 2 

dr X 2 


r - v raJ T " - - ro ' 
e sin (rco) dco — — \ e cos ( rco ) ^ 

(1 " 




= — -r— or 
A 


Id/ r 
/ rfr ~" X ' 



From which we find 

log / = -j^- + log A 
where log A is a constant. Hence we have : — 



/ = Ae 2 ^ 
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In erder to determine A we let r = 


= and we 


have 




/„ = A = 2 \ e dco = 2 /^- 


= !/?■ 



This finally gives the expression for cp (cc) in the 
following form : 

as a preliminary approximation for the frequency 
curve 9(33). 

The first mathematical deduction of this ap- 
proximate expression for a frequency curve is 
found in the monumental work by Laplace on 
Probabilities, and the function cp (a;) entering in 
the expression cp (a;) dx, which gives the probab- 
ility that the variate will fall between x — \dx 
and x +\ dx, is therefore known as the Lapla- 
cean probability function or sometimes as the 
Normal Frequency Curve of Laplace. The same 
curve was, as we have mentioned also previously 
deduced independently by Gauss in connection 
with his studies on the distribution of accidental 
errors in precision measurements. 

Laplace's probability function, cp (x) posses- 
ses some remarkable properties which it might 
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be well worth while to consider. Introducing a 
slightly different system of notation by writing 
\ = M and \/\ 2 = a, q> (x) reduces to the fol- 
lowing form. 

o|/2tt 

which is the form introduced by Pearson. 

The frequency curve, cp (a;), is here expressed 
in reference to a Cartesian coordinate system with 
origin at the zero point of the natural number 
system and whose unit of measurement is also 
equivalent to the natural number unit. It is, 
however, not necessary to use this system in pre- 
ference to any other system. In fact, we may 
choose arbitrarily any other origin and any other 
unit standard without altering the properties of 
the curve. Suppose, therefore, that we take M 
as the origin and c as the unit of the system. The 
frequency function then reduces to 

1 - x' : 2 

Since the integral of cp (x) from — oo to + oo 
equals unity the following equation must neces- 
sarily hold. 

+* 
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9. development The Laplacean Probability 

BY POLYNOMIALS ^^ ^^^ howeyev , 

some other remarkable proper- 
ties which are of great use in expanding a func- 
tion in a series. Starting with cp (x) we may by 
repeated differentiation obtain its various der- 
ivaties. Denoting such derivatives by cp x (x) , 
<p 2 (x), cp 3 (x) . . . respectively we have the fol- 
lowing relations. 1 ) 

— x': 2 

cp (a;) = e 

<Pi(z) = —xy (x) 

(p 2 (z) = (z 3 — l)cp (a;) 

Vsfa) = — (« 3 — 3x)cp (a;) 

(p 4 (a;) = (j? — Gx* + 3)y Q (x) 



and in general for the nth derivative : 
cp B (a;) = (-ir 

n(n — l)(w — 2)(w — S)x 



_ n(n~l) n ~ 2 

X> 1 y-^ X + 



2-4 

ra (n-1) (/i- 2) (ra-3) (re-4) (rc-5) aT~ 6 

2-4,-6 + " 



cp (aj). 



1 In the following computations we have omitted 
temporarily the constant factor l;j/"2ir of <p (a:) and its 
derivatives. 
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It can be readily seen that the derivatives of 
<p (x) are represented throughout as products of 
polynomials of x and the function cp (x) itself. 
The various polynomials 

H (x) = 1 

H^x) = — x 

H a (x) = x 2 — l 

H a (x) = -(x*-3x) 

H^r) = (.,<* — 6 a* + 3) 

and so forth are generally known as Hermite's 
polynomials from the name of the French mathe- 
matician, Her mite, who first introduced these 
polynomials in mathematical analysis. 

The following relations can be shown to exist 
between the three polynomials 

H n+ i(x) — xH n (x) + nH n --i.{x) = 
and 

d 2 H n (x ) xdH n (x) _ 

A numerical 10 decimal place tabulation of the 
first six Hermite polynomials for values of x up 
to 4 and progressing by intervals of 0.01 is given 
by J0rgensen in his Danish work "Frekvens- 
flader og Korrelation" . 

There exist now some very important relations 
between the Hermite polynomials and the deriva- 
tives of <p (x) , or between H n (x) and y n (x). 
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Consider for the moment the two following 
series of functions 

ToO"). M^Di %(*)> <?s( x h <?i( x ), ■ • • 
H (x), H x {x), H 2 (x), H,{x\ H t (x),. . . 

where cp„(a;) = i/„ (a;) cp (a;) and where lim y„(x) = 
for ./' = ± oo. 

We shall now prove that the two series cp„ (x) 
and H n (x) form a biorthogonal system in the 
interval — oo to + oo , that is to say that they are 

(1) real and continuous in the whole plane 

(2) no one of them is identically zero in the 

plane 

(3) every pair of them cp n (x) and H m {x) , 

satisfy the relation. 

+.<* > 

\ <p n (x)H m (x)dx = (n < m). 

We have the self evident relation (letting x = z) 

-f-eo -j-OO 

5 H m (z)y n (z)dz = $ # m (z).ff„(z)(p (z)dz = 

CC CO 

+« 
= jj #„(z)cp m (z)dz. 

Since this relation holds for all values of m and n 
it is only necessary to prove the proposition for 
n>m. For if it holds for n>m it will according 
to the above relation also hold for n<m. 
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By partial integration we have : — 

jj H m (z)(p n (z)dz = 

00 

+ 00 + 00 

= H m {z)y n -i{z) ] — $ H' m (z)y n -i(z)dz 

— Go — or. 

when H' m {z) is the first derivative of H m {z). 

The first member on the right reduces to 
since <p„_i(z) = for z = ± qo. We have therefore : — 

+ 00 , - tt 

$ # m (z)cp„(z)dz = — jj H' m (z)y n -i(z)dz 

— Co — 00 
-j-co -pOo 

jj H' m (z)<? n _i(z)dz = — jj H&(z)y n -2(z)dz 

— co — Co 
-}-00 -f-ao 

J ^m(z)(p„_2(z)rfz = — $ /7£(z)q> n _ 3 (z)dz. 

— 00 OC 

Continuing this process we obtain finally an ex- 
pression of the form 

+ (lf m (z)<? n (z)dz = (-ir +1+ ^V +1, 9 n _ w _,( Z )&, 

— Co — -oc 

when #°" +1) (z) is the m + 1 derivative of # (z) 
and n — to — 1>0. Since H m (z) is a polynomial in 
the TO.th degree its w + 1 derivative is zero and 
we have finally that 
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+ » 
jj H m (z)y n (z)dz = 

for all values of m and n where ^^ /h . 

For m = n we proceed in exactly the same 
manner, but stop at the mth integration. We 
have, therefore, by replacing m by n in the above 
partial integrations 

+ (HA*)Vn{z)dZ = (-l)"'f< ) ( Z )«p,_ n («)& = 

— 00 — CO 



The nth derivative of H n (z) is, however, nothing 
but a constant and equal to ( — l)"|_ra_. Hence we 
have finally 

'fjy n (8)q. n (8)cfe = {-lf{-lf\±\ e -^dz = 

00 CO 

= |_ra |/2jt. 

The above analysis thus proves that the func- 
tions H m (z) and <p B (s) are biorthogonal to each 
other for all values of n different from m through- 
out the whole plane. 

We can now make use of these relations be- 
tween the infinite set of biorthogonal functions 
H m (z) and <p„(z) in solving the problem of ex- 
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panding an arbitrary function cp (z) in a series 
of the form 

9(0) = C <p (2) + Cl Cp, (z) + C,cp 2 (3) + . . . 

the series to hold in the interval from — 00 to 
+ 00. 

If we know that 9(2) can be developed into 
a series of this form, which after multiplication 
by any continuous function can be integrated 
term for term, then we are are able to give a 
formal determination of the coefficients c. 

This formal determination of any one of the 
c's, say C{ consists in multiplying the above 
series by Ht(z) and integrating each term from 
— 00 to co. All the terms except the one con- 
taining the product Hi (z) <pi vanish and we have 
for Ci. + oo +00 

CO — CO 

°i = +S ~ • 

\yi(z)Hi(z)dz |J_|/2^ 

CO 

If we define the Hermite functions as 
H (z) =1 

HAz) = z 2 — 1 
if,(«) = z s — 3z 
HAz) - 2 4 — 6« 2 + 3 
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the above formula takes on the form 

+ 00 + CO 

jj cp (z) Hi (z) dz § <p (z) #i (z) rfz 

00 00 

|j cpi (z) #* (z) <fe (— l) r [i_ \/ 2 JT 

— CO 

which we shall prefer to use in the following 
discussion. 

It will be noted that this purely formal cal- 
culation of the coefficients c is very similar to the 
determination of the constants in a Fourier Series, 
where as a matter of fact the system of functions 

cosz, cos 2«, cos 32, 

sin;-;, sin 22, sin Zz, 



is biorthogonal in the interval 0<z<l- 

But the reader must not forget that the above 
representation is only a formal one, and we do 
not know if it is valid. To prove its validity 
we must first show that the series is convergent 
and secondly that it actually represents 9(2) for 
all values of 2. 

This is by no means a simple task and it can- 
not be done by elementary methods. A Russian 
mathematician, Vera Myller-Lebedeff, has, how- 
ever, given an elegant solution by means of some 
well known theorems from the Fredholm integral 
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equations. She has among other things proved 
the following criterion : — 

"Every function cp (z) which together with its 
first two derivatives is finite and continuous in the 
interval from — co to + oo and which vanishes 
together with its derivatives f or z = ± oo can be 
developed into an infinite series of the form : — 

cp( 2 )=^>- z ' :2 #.( 0) 

where Hi(z) is the Hermite polynomial of 
order i" . 



10. gram's series It is, however, not our inten- 
tion to follow up this treatment 
which is outside the scope of an 
elementary treatise like this and shall in its place 
give an approximate representation of the fre- 
quency function, cp(z), by a method, which in 
many respects is similar to that introduced by 
the Danish actuary Gram in his epochmaking 
work "Udviklingsrsekker" , which contains the 
first known systematic development of a skew 
frequency function. Gram's problem in a some- 
what modified form may briefly be stated as 
follows : — Being given an arbitrary relative fre- 
quency function, cp (z), continuous and finite in 
the interval — oo to + oo (and which vanishes 

3 



34 Frequency Curves. 

for z = ± oo J to determine the constant coeffi- 
cients c , c 1 , c 2 , c 3 in such a way that 

the series 

c 9o(g) + Ci9i(g) + c 2 cp 2 (z) + + c n yn($ = 
|/<Po(z) l/<PoO) l/?o( 3 ) |/9o( 2 ) 

gifles ifte besi approximation to the quantity 
cp fa;,) : )/cp (zj in ifoe sense of the method of least 
squares. That is to say we wish to determine the 
constants c in such a manner that the sum of 
the squares of the differences between the func- 
tion and the approximate series becomes a mini- 
mum. This means that the expression 



^ C 9(2) X'^c^iiz) 



y\ 



|/?o( 2 ) ^—> 1/<PoO)- 



dz 



must be a minimum. 

On the basis of this condition we have 

j^f m<) Zci(?i{2) = ^^^ = U(s} 

where the unknown coefficients c must be so de- 
termined that 
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/ = 



nVvM 



dz equals a minimum. 



+» 



Taking the partial derivatives in respect to Ci we 
have 

— CD —00 

Now since 

-i- CO 

\ [U{z)] 2 dz = 

05 

{{cl [H,{z)}*+< [#:(*)]'+ • ..cl[H n {z)Y}^{z)dz, 
we get 

4-co -f 00 

¥-= -2 [ ^=H i (z)]/^)dz+2c i \ [ff,(s)]'<p («)t 

where the latter integral equals 

$ <p t (e)Hi(e)dz = (— l)*[i|/2«. 

Equating to zero and solving for c* we finally 
obtain the following value for d — 

.+00 

d = ,^=U y{z)Hi{z)dz (1=1,2,3,...)- 
|i J/2n J 
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This solution is gotten by the introduction of 
|/cp (z) which serves to make all terms of the 
form Cicpi(0):|/<p o (0) = |Ap (» C;#»(z) (i = 1, 2, 
3 . . . n) orthogonal to each other in the interval 
— oo to + oo. 

In all the above expansions of a frequency 
series we have used the expression % (z) = e~ za/a 
as the generating function (see footnote on page 
26) , while as a matter of fact the true value of 
<p (z) is given by the equation <p (z) = e~ z " /2 : |/2ji. 

The definite integral on page 32 

+ 00 -t~°° 

(- 1)* \ H t (e) Vi {z)de = \i_ $ e-^dz = \£fte 



will therefore have to be divided by |/2jt, and 
the value of the gen 
forth be reduced to 



the value of the general coefficient c$ will hence- 



$ ^{z)H i {z)dz 

Ci== ""(_l) i li 

where Sj (z) is the Hermite polynomial of order 
i defined by the relation 

%K) 2 2-4 

i (t — 1) (t — 2) (i — 3) (t — 4) (t — 5) z f ~ 6 

2-4-6 + '"' 
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On this basis we obtain the following values 
for the first four coefficients : — 
+» 
c = jj y(s)dz = 1 

— cc 

+» 
c x = (— l) 1 $ cp(z)zefe : |l_ 

— CO 

+«> 
c, = (— l) 2 jj (z 2 — 1) cp(z) & : |_2_ 

— CO 

+°° 
c 3 = (_ 1)3 J 2 s_ 3z)cp(z)<fe:|3_ 

— Co 

c 4 = (— 1)*^ (z 4 — 6z 2 + 3 e) 9 (2) cfe : |5_ 

— CO 

While the above development of an arbitrary 
frequency distribution has reference to 9 (z) , or 
the relative frequency function, it is, however, 
equally well adapted to the representation of ab- 
solute frequencies as expressed by the function, 
F(z). If N is the total number of individual 
observations, or in other words the area of the 
frequency curve, we evidently have 

-{-Go "j~°o 

F(z) = iVcp(z) or $ F{z)dz = N J y(z)dz = N. 

00 — CO 

Since N is a constant quantity we may, there- 
fore, write the expansion of F(z) as follows: 
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F(z) = iV[c <p (z) + c 1 cp 1 (z) + c 2 cp 2 (2)+ . . .] = 
= NZctHMe-** 
where the coefficients ci have the value 

+ CO 

d = t~Z J F(z)H i (z)dz for i = 1, 2, 3, . . . 

• CO 

and where 

N = \ F{z)dz. 

CO 

Since all the Hermite functions are polynom- 
ials in z, it can be readily seen that the coeffi- 
cients c may be expressed as functions of the 
power sums or of the previously mentioned sym- 
metrical functions s, where 

s r = jj z r F{z)dz. 

— Co 

These particular integrals originally introduced 
by Thiele in the development of the semi-in- 
variants have been called by Pearson the 
"moments" of the frequency function, F(z), and 
s r is called the r* A moment of the variate z with 
respect to an arbitrary origin. 

It can be readily seen that the moment of 
order zero, or s is 
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-f-CD -|-Q0 

s = \ z°F(z)dz = N = N \ y{z)dz. 

— Co — co 

Hence we have for the first coefficient c . 

+ 00 -j-00 

c = $ F(z)dz: $ F(z)dz = 1. 

CC -)~Q0 

We are, however, in a position to further 
simplify the expression for F(z). 

As already mentioned we are at liberty to 
choose arbitrarily both the origin and the unit 
of the Cartesian coordinate system for the fre- 
quency curve without changing the properties of 
this curve. Now by making a proper choice of 
the Cartesian system of reference we can make 
the coefficients c 1 and c 2 vanish. In order to ob- 
tain this object the origin of the system must be 
so chosen that 

^ \ zF(z)dz : \ F(z)dz = 0. 



c, = 



This means that the semi invariant s r : s = A x 
must vanish. It can be readily seen that the above 
expression for X u is nothing more than the usual 
form for the mean value of a series of variates. 
Moreover, we know that the algebraic sum (or 
in the case of continuous variates, the integral) 
of the variates around the mean value is always 
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equal to zero. Hence by writing for z the expres- 
sion (z — M) when M equals the mean value or 
\j we can always make c x vanish. 

To attain our second object of making c 2 
vanish we must choose the unit of the coordinate 
system in such a way that the expression 

+ 00 -{-00 

c 2 = t~^ jj F(z)R 2 (z)dz : ^ F{z)dz = 



which implies that 



-(-03 -}-O0 



+ « 



\ F(z)z 2 dz — $ F(e)de : $ F(z)dz = 



or that s 2 : s — 1 = 0, or when expressed in terms 
of the semi-invariants that 

X 2 = (s 2 s — s\):sl = 1. 

But by choosing the mean as the origin of the 
system the term s x : s is equal to and we have 
therefore X 2 = 2 = s 2 : s = 1. Hence, by selec- 
ting as the unit of our coordinate system j/X 2 or 
o, where o is technically known as the dispersion 
or standard deviation of the series of variates, we 
can make the second coefficient c 2 vanish. 

In respect to the coefficients c 3 and c 4 we 
have now 



c* = 



(-1)3 
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+ 00 + C0 
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J z*F(z)dz — 3 ^ 2i?(2)cfe : J i?(g)<fe 



+ 00 



which reduces to 
(-1) 4 



A^- while 



r+ao 



C, = 



|4 



^ 2 + F(z)cfe— 6 J 2 2 F(s)& + 



+ 00 -I -|-00 

+ 3 $ F{z)dz : J i^(2)rf2 



which reduces to 



A ± D Siy Oft 

5 $ Q S Q 



14 = 



— 3 



While the coefficients of higher order may be 
determined with equal ease, it will in general be 
found that the majority of moderately skew fre- 
quency distributions can be expressed by means 
of the first 4 parameters or coefficients. 



n. coefficients We shall now show how the 
semi-invariants same results for the values of 
the coefficients may be ob- 
tained from the definition of the semi-invariants. 
Since we have proven that a frequency function, 
F(z), may be expressed by the series 
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F ( z ) =JEciyi(z) 

we may from the definition of the semi-invariants 
write down the following identity : — 

X,co X./o z 

— \-— 1- 

\U + [2_ +■" 

s e = 

+» 
= N $ e 0ra (c o cp o (2) + c 1 cp 1 (z) + c 2 cp 2 (s) + ...)d2 

where N is the area of the frequency curve. 

The general term on the right hand side of 
the equation will be of the form 

+» 
c r $ e zw Q? r (z)ds 

where the integral may be evaluated by partial 
integration as follows : — 

-(-00 ^\~ x H -00 

$ e z< °y r (z)dz = e'Vi(«) ] — ra $ e? a y r - X {z)dz, 

— oo — co — oo 

and where the first term on the right vanishes 
leaving 

+ 00 -j-00 

$ e 20 > r (2)cfe = (-co) 1 $ e"°<pr-i(e)de. 

— 00 CO 

Continuing in the same manner we obtain by 
successive integrations 
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+» +00 

(_„)! J e "°y r -.^z)dx = (-co) 2 J e 2m cp r _ 2 (z)dz 

— oc —00 

+ CO -[-GO 

(-co) 2 5 e 2m cp r _ 2 (2)(fe = (— co)8 J e zro (p r _ 3 (0)d2 



from which we finally obtain the relation 

+ 0= +00 

ij e zw <p r (z)dz = (-co)' J e za> <p (z)dz = 

— 00 — 05 

+» z * 

1 "IT.'-'*. 



^ 



?s 



This latter integral may be written as 

1/2* 3 



Consequently the relation between the semi-in- 
variants and the frequency function may be writ- 
ten as follows : — 



CO" 

~2~ 
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X,co \,co z 

— U_J 1- 

LL + LL + 
s e = 

= N [c — c ± co + c, co 2 — c 3 co 3 + . 
or 

J^CO CO 2 

lr + [I (x,-i) + ... 
s e = 

= iV [c — q CO + c 2 CO — c 3 CO 3 + . . .] . 

By successive differentiation with respect to co 
and by equating the coefficients of equal powers 
of co we get in a manner similar to that shown 
on page 13 the following results : — 

. _ £o _ fo _ 1 
C ° - N ~ s - 

c x = — \ 



= ri[(*2-l) + ^ 



° 3 = ^f A 3 + 3(X 2 -l)A 1 + X^] 

c* = rjk+4\3X 1 + 3(A 2 -l) 2 + 6(X 2 -l)^+Xt]- 

If we now again choose the origin at A x , or 
let Aj = 0, and choose j/A 2 = 1 as the unit of our 
coordinate system we have : — 

c o = 1, <h = °) C 2 = 0. c 3 = -ry- A 3 , c 4 = .-^-A 4 . 
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12. linear trans- The theoretical development of 
the above formulae explicitly 
assumes that the variate, z, is 
measured in terms of the dispersion or |A 2 (z) and 
with X x (z) as the origin of the coordinate system. 
In practice the observations or statistical data are, 
however, invariably expressed with reference to 
an arbitrarily chosen origin (in the majority of 
cases the natural zero of the number scale) and 
expressed in terms of standard units, such as 
centimeters, grams, years, integral numbers, etc. 
Let us denote the general variate in such ar- 
bitrarily selected systems of reference by x. Our 
problem then consists in transforming the various 

semi-invariants, \(x), X 2 (x), \(x), \(%) 

to the z system of reference with \ (z) 

as its origin and |A 2 (z) as its unit. Such a trans- 
formation may always be brought about by means 
of the linear substitution 

z = ax+b 1 

which in a purely geometrical sense implies both 
a change of origin and unit. On page 16 we 
proved the following general properties of the 
semi-invariants 

\ t (s) = X 1 (ax + b) = a\(x) + b 
\ r ( 2 ) = X r (ax+b) = a r X r (x). 



46 Frequency Curves. 

Let us now write \ (x) = M and A 2 (x) = d 2 , 

we then have the following relations : — 

X^z) = aM + b 

XjjOO = a 2 c 2 . 

Since the coordinate system of reference must 

be chosen in such a manner that \ (z) = and 

)A 2 (z) =1 we have : — 

aM + b = 

ad = 1 

, • 1 , l — M 

from which we obtain a = — and o = — - — , 

o <3 

which brings z on the form : z= (x — M) : c while 
cp (2) becomes 

, •. 1 — (i — ilf 2 ):2 ! 

J/ZTTd 

Moreover, we have \ r (z) = X r (a;) : C for all 
values of r > 2. We are now able to epitomize 
the computations of the semi-invariants under the 
following simple rules. 

(1) Compute \ (x) in respect to an arbitrary 
origin. The numerical value of this parameter 
with opposite sign is the origin of the fre- 
quency curve. 

(2) Compute A, (as) for all values of r > 2. The 
numerical values of those parameters divided 
with (J/X 2 (x) r , or cr, for r = 2, 3, 4, . . . 
.... are the semi-invariants of the frequency 
curve. 
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13. chablier's The general formulae for the 

SCHEME OF ■ • • , 

computation semi-invariants were given on 
page 13. In practical work 
it is, however, of importance to proceed along 
systematic lines and to furnish an automatic check 
for the correctness of the computations. Several 
systems facilitating such work have been proposed 
by various writers, but the most simple and 
elegant is probably the one proposed by M. Char- 
lier and which is shown in detail with the neces- 
sary control checks on the following page. Char- 
lier employs moments, while we in the following 
demonstration shall prefer the use of the semi- 
invariants. 

If we define the power sums of the relative 
frequencies 9(2;) by the relation 

-j-00 "h °° 

m r = \ x r F(x)dx : jj F(x)dx (r = 0, 1, 2, 3, . . .), 

— 00 — CO 

we find that the expressions for the semi-invariants 
as given on page 13 may be written as fol- 
lows : — ■ 
Aj = m 1 

A, = m 2 — m l 

A 3 = m 3 — 3m 2 w 1 + 2m^ 

A 4 = m 4 — 4m 3 m 1 — 3ml + 12m 2 m[ — 0>m l 
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The advantage of the Charlier scheme for the 
computation of the semi-invariants lies in the fact 
that it furnishes an automatic check of the 
final results. If we expand the expression 
(x + l) 4 F(x) we have: — 

x i F(x) + ix 3 F{x) + 6x 2 F(x) + 4:xF(x) + F(x) 
or 

^(x+l) 4 F(x) = s i + 4:S 3 + Qs 2 + 4:S 1 + s , 

which serves as an independent control check of 
the computations. Moreover, another check is 
furnished by the relation 

m i = A 4 + 4m 1 A 3 + §m\ \ 2 + 3\ 2 2 +m\. 

In order to illustrate the scheme we choose the 
following age distribution of 1130 pensioned func- 
tionaries in a large American Public Utility cor- 
poration. 



Ages 


No. of Pensioners 


Ages 


No. of Pensioners 


35—39 


i 


65—69 


286 


40—44 


6 


70—74 


248 


45—49 


17 


75—79 


128 


50—54 


48 


80—84 


38 


55—59 


118 


85—89 


13 


60—64 


224 


over 90 


3 



The complete calculations of the coefficients c 
are shown in the appended scheme by Charlier. 
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The above computations give the numerical 
values of the frequency function which now may 
may be written as follows : 

F(x) = 1130 [(cpoCz) + .0258 cp 3 (z). 0158 <p 4 (re)] 
where _ ^ / x + .oi95 \' 

1 2 V 1.6240 ) 



"' betwUnob- The next ste P is now to work 
SE Yhe D ob^¥ica1 ND out the numerical values of 
values F(x) for various values of x 

and compare such values with the ones originally 
observed. This process is shown in detail in the 
following scheme . 

Column (1) gives the values of the variate x 
reckoned from the provisional origin, or the centre 
of the age interval 65-69. (2) is x less the first 
semi-invariant, whereby the origin is shifted to 
the mean or X. Column (3) represents the final 
linear transformation : z =(x — A-,): d. 

Columns (4), (5) and (6) are copied directly 
from the standard tables of J0rgensen or Charlier. 
Column (7) is (5) multiplied by 0.0258 or the 
product — [c 3 <? 3 (z)]:{3_, while (8) is [c 4 cp 4 (z)]: [4. 

Column (9) is the sum of (4), (7) and (8). 
If we now distribute the area N = s or 1130 pro 
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rata according to (9) , we finally reach the theore- 
tical frequency distribution expressed in 5-year 
age intervals and shown in column (10) alongside 
which we have inserted the originally observed 
values. Evidently the fit is satisfactory. It will 
be noted that the final frequency series is expres- 
sed in units of 5-year age intervals. This, how- 
ever, is only a formal representation. By sub- 
dividing the unit intervals of column (1) in 5 
equal parts, and by computing all the other 
columns accordingly, we get the theoretical fre- 
quency series expressed in single year age inter- 
vals. 

is. the principle The following paragraph pur- 

OF METHOD OF , , ■ , • n •, ■ 

least squares ports to give a brief exposition 
of the determination of the co- 
efficients in the Gram or Laplacean — Charlier 
series in the sense of the method of least squares 
as a strict problem of maxima and minima, wholly 
independent of the connection between the method 
of least squares and the error laws of precision 
measurements. l 

The simple problem in maxima and minima 
which forms the fundamental basis of the method 



1 In the following demonstration I am adhering to 
the brief and lucid exposition of the Argentinean actuary, 
U. Broggi, in his exellent Traite d' Assurances sur la Vie. 
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of least squares is the following : Let m unknown 
quantities be determined by observations in such 
a manner that they are not observed directly but 
enter into certain known functional relations, 
fi(x 1 , x 2 , x 3 , . . . . x m ) , containing the unknown 
independent variables, x lt x 2 , x 3 , . x m . Let 
furthermore the number of observations on such 
functional relations be n (where n is greater than 
m). The problem is then to determine the most 
plausible system of the values of the unknowns 
from the observed system. 

11 \%1 ) ^"11 ^3 l ■ • • %m) = #1 



fn V^i j ^2 ? *^3 1 ' • • ^m) — On 

when f lt f 2 , . . . f n are the known functional 
relations and o x , o 2 , . . . o n their observed values. 
Such equations are known as observation equa- 
tions. 

In order to further simplify our problem we 
shall also assume that 

1 All the equations of the system have the 
same weight, and 

2 All the equations are reduced to linear form. 
By these assumptions the problem is reduced 

to find m unknowns from n linear equations. 
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a 1 x 1 + b 1 x 2 + . 


, . = o 1 


a 2 x x + b 2 x 2 + . 


. . = o 2 


a s .x x + b 3 x 2 + . 


■ • = ° 3 


&n %\ i O n X% + . . 


. = 0„ 



Since n is greater than m we find the problem 
over-determined, and we therefore seek to deter- 
mine -the unknown quantites, x lt x 2 , . . . x,„ in 
such a way that the sum of the squares of the 
differences between the functional relations and 
the observed values, o becomes a minimum. This 
implies that the expression 

i = m 

£(a i x 1 + b i x 2 + . . . — oif = ^(^n x ii ■ ■ - x m) 

i = l 

must be a minimum or the simultaneous existence 
of the equations. 

£1 = 0,^ = 0,. ..^ = o. (/) 
ox x ox 2 ox m 

If we now introduce the following notation 

OiX 1 + biX 2 + • • ■ — Oi = Xj for i = 1, 2, 3, . . . re, 

the m equations in the above system (I) evidently 
take on the following form 
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X 1 a 1 + X z a z + . . . +X n a n = 
\ x b x + \ 3 b 2 + . . . + X n K = 



If we now again re-substitute the expressions 
for A in terms of the linear relations 

OiX 1 + biX 2 + . . . Oi = h, for i = 1, 2, 3, . . . n, 

and collect the coefficients of x x , x 2 , . . . x„, these 
equations may be expressed in the following sym- 
bolical form : 

[aa]^! + [af)]a; 2 + . . . . — [ao'] = 
[ab^x 1 + \bb']x 2 + . . . . — \bo] = 



[ak~]x 1 + [bk}x 2 + . . + \Jik~]x m — [feo]=0 

where [aa] = a x 2 + a./ + . . . . 
[ab~] = a x bj + a 2 b 2 + . . . . 

is the Gaussian notation for the homogeneous sum 
products. 

The above equations are known as normal 
equations, and it is readily seen that there is one 
normal equation corresponding to each unknown. 
Our problem is therefore reduced to the solution 
of a system of simultaneous linear equations of m 
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unknowns. If m is a small number, or, what 
amounts to the same thing, there are only two or 
three unknowns the solution can be carried on 
by simple algebraic methods or determinants. If 
the number of unknowns is large these methods 
become very laborious and impractical. It is one 
of the achievements of the great German mathe- 
matician, Gauss, to have given us a method of 
solution which reduces this labor to a minimum 
and which proceeds along well denned systematic 
and practical lines. The method is known as the 
Gaussian algorithmus of successive elimination. 



is. gauss' solu- For the sake of simplicity we 

TION OF NORMAL i nl ,- M. 1 i. 

equations snail limit ourselves to a sy- 
stem of four normal equations 
of the form 

[aa]^! + [ab]x 2 + [_ac]x s + [arf]^ — [ao~\ = 
[ab]^! + \bb~]x 2 + [bc]x i + [bd]x i — [bo] = 
[ac]^! + [bc]a: 2 + [cc]:r 3 + [cd']x i — [eo] = 
[ad]x 1 + [bd]x 2 + [cd~\x 3 + [dd]x i — [cfo] = 

The generalization to an arbitrary number of 
unknowns offers no difficulties, however. 

On account of their symmetrical form the 
above equations may also be written in the more 
convenient form, viz. : 
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[aa~\ x 1 + [ab~\x 2 + [ac~\x 3 + [_ad~]x i — [ao] = 

[bb~]x 2 + [bc]x 3 + [bd]a; 4 — [bo] = 

[cc]a; 3 + [cd]x i — [co] = 

[dd] Xi — [do] = 

From the first equation we find 

^ ~ [ao] [ao] 2 . [aa] 3 [aa] 4 ' 

Substituting this value in the following equa- 
tions and by the introduction of the new symbol 

[ik] — H[oft] = [ik.l] 
[aa] 

we now obtain a new system of equations of a 
lower order and of the form 

[bb.l]x 2 + [bc.l]x 3 + [bd.l]ir 4 — [bo.l] = 

[cc. 1]» 3 + [cd. l]a; 4 — [co.l] = 

[dd.l]x 4 — [do.l] = 

Solving for x 2 we have 

[bo.l] [bc.l] [bd.l] 

X * == [bb.l] [bb.l] Xi [bb.l] Xi ' 

Substituting in the following equations and 
writing 
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we have 

[cc.2]x s + [cd.2]x 4 = fco.2] 
[dd.2]x t = [do.2] 
or 

[co.2] [cd.2] 
3 ~ [cc.2] [cc.2] Xi ' 

Moreover, by writing 

[ik.2] = [ci.2]&±=[ik.S], 

we have finally 

[dd.S]x A = [do.3] 

This gives us the final reduced normal equa- 
tion of the lowest order. By successive substitu- 
tion we therefore have : 

[do.3] 
4 _ [dd.S] 

[co.2] 
[cc.2] ' 

_ [bo.l] _ [bc.l] [bd.l] 
x * ~ [bb.l] [bb.l] [bb.l] 

_ [ao]_[ab] _\ac\ [ad] 

Xl ~ [aa] [aa] 2 [aa] X * [aa] Xi 

as the ultimate solution of the unknowns. 



[co.2] [cd.2] 
Xz ~ [cc.2] [cc.2] ' 



60 Frequency Curves. 

17. arithmetical The example in paragraph 13 
APP mbtho°d ° F gave an illustration of the ap- 
plication of the method of mo- 
ments. As previously stated this method works 
quite well in cases of moderate skewness, but is 
less successful in extremely skew curves and where 
the excess is large. We shall now give an illustra- 
tion of the calculation of the parameters by the 
method of least squares. The example we choose 
is the well-known statistical series by the disting- 
uished Dutch botanist, de Vries, on the number 
of petal flowers in Ranunculus Bulbosus. This 
is also one of the classical examples of Karl Pearson 
in his celebrated original memoirs on skew varia- 
tion. Although the observations of de Vries lend 
themselves more readily to the method of logarith- 
mic transformation, which we shall discuss in a 
following chapter, we have deliberately chosen to 
use it here for two specific reasons. Firstly it is 
a most striking illustration in refutation of the 
immature criticism of the Gram-Charlier series 
by a certain young and very incautious American 
actuary, Mr. M. Davis, who has gone on record 
with the positive statement, "that the Charlier 
series fails completely in case of appreciable skew- 
ness". Secondly (and this is the more important 
reason) it offers an excellent drill for the student 
in the practical applications of the method of least 
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squares because it gives in a very brief compass 
all the essential arithmetical details. The observa- 
tions of de Vries are as follows : i 



No. of petals 


X 


F{x) = o. 


5 





133 


6 


1 


55 


7 


2 


23 


8 


3 


7 


9 


4 


2 


10 


5 


2 



where F(x) denotes the absolute frequencies. The 
observed frequency distribution is well nigh as 
skew as it can be and represents in fact a one- 
sided curve, and should therefore — if the state- 
ment by Mr. Davis is correct — show an absolute 
defiance to a graduation by the Gram-Charlier 
series. 

The process we shall use in the attempted 
mathematical representation of the above series is 
a combination of the method of semi-invariants 
and the method of least squares. Following 
Thiele's advice we determine the first two semi- 
invariants in the generating function directly from 
the observations while the coefficients of this 
function and its derivations are determined by 
the least square method. 

Choosing the provisional origin at 5, we obtain 
the following values for the crude moments. 
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s = 222, s 1 = 140, s 2 = 292, s 3 = 806, s 4 = 2,752, 
s 5 = 10,790, s 6 = 46,072, s 7 = 207,226, 

from which we find that 

\ = 1, x x = 0.631, A 2 = 0.917, X 3 = 1.644, 

A 4 = 3.377, A 5 = 5.972, X 6 = —2.911, 

X 7 = 122.638. 

All these semi-invariants with the exception 
of the two first are, however, so greatly influenced 
by random sampling in the small observation 
series that it is hopeless to use them in the deter- 
mination of the constants in the Gram-Charlier 
series. In fact an actual calculation does not give 
a very good result beyond that of a first rough 
approximation. The generating function, on the 
other hand, may be expressed by the aid of the 
two first semi-invariants as follows : 

]_ — 2 2 :2 

9 ° w = m e ' 



where z is given by the linear transformation : 
z = (3 — 0.631) : 0.9576. (\/)T 2 = 0.9576). 

We now propose to express the observed func- 
tion F(x) or 9(2) by a Gram-Charlier series of 
the form : 
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F(x) = cp(z) = A; cp (z) + A: 3 cp3(z) + /c 4 cp 4 (z). 

In this equation we know the values of the 
generating function and its derivatives for various 
values of the variate z as found in the tables of 
J0rgensen and Charlier, while the quantities k are 
unknowns. On the other hand we know 6 specific 
values of F(x) as directly observed in de Vries's 
observation series. We are thus dealing with a 
system of typical linear observation equations of 
the forms described in paragraphs 15 and 16 
and which lend themselves so admirably to the 
treatment by the method of least squares. 

From the above linear relation between x and 
z we can directly compute the following table for 
the transformed variate z. 



X 


3 





—0.688 


1 


+ 0.402 


2 


+ 1.493 


3 


+ 2.583 


4 


+ 3.674 


5 


+ 4.764. 



The numerical values of <% (z) and its derivat- 
ives as corresponding to the above values of z can 
be taken directly from the standard tables of J0r- 
gensen and Charlier. We may therefore write 
down the following observation equations : 
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?0 


<J>3 


ft 





.3148fc 


— .5472fc 3 


+ .1207fe 4 


—133 = 


.3679/c 


+ .4198fe 3 


+ .7566fe 4 


— 55 = 


.1308/c 


+ .1506fc 3 


— .7073fc 4 


— 23 = 


.0145fe„ 


— .1346fc 3 


+ .1062fc 4 


— 7 = 


.0005fe 


— .0180fc 3 


+ .0486fc 4 


— 2 = 


.0001fc„ 


— .0005fc 3 


+ .0020fe 4 


— 2 = 



for which we now propose to determine the un- 
known values of 7c by the least square method. 

While this method may of course be applied 
directly to the above data, it will generally be 
found of advantage to start with some approximate 
values of the k's. It is found in practice that 
this approximate step saves considerable labour 
in the formation and ultimate solution of the 
normal equations. 

Although the first approximation in the case 
of numerous unknowns must be in the nature of 
a more or less shrewd guess, which facility can 
only be attained by constant practice in routine 
mathematical computing, we are, however, in this 
specific instanoe able to tell something about the 
nature -o fthe coefficients from purely a priori con- 
siderations. We know for instance from the form 
of the Gram-Charlier series that the coefficient k 
of the generating function must be nearly equal 
to the area of the curve, which in this particular 
instance is 222. Moreover, a mere glance at the 
observed series tells us that it has a decidedly 
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large skewness in negative direction from the 
mean coupled with a tendency of being "top 
heavy", indicating positive excess. We can there- 
fore assume as a first approximation that the 
coefficients of the derivatives of uneven order are 
negative and the coefficients of derivatives of even 
order are positive. 

From such purely common sense a priori con- 
siderations we therefore guess the following first 
approximations, viz. : 

k l = 222, k\ = — 25, k\ = 30. 

The probable values of the various fc's may be 
written as 

h, = rik\ for i = 0, 3, 4, 
and our problem is therefore to find the correction 
factor r with which the approximate value k\ 
must be multiplied so as to give kt. 

Applying the various values of k\ to the 
original observation equations on page 64 we obtain 
the following schedule for the numerical factors 
of 



a 


b 


c 





s 


69.9 


+ 13.7 


+ 3.6 


—133.0 


—45.8 


81.7 


—10.5 


22.7 


— 55.0 


+ 38.9 


29.1 


— 3.8 


—21.2 


— 23.0 


—18.9 


3.3 


+ 3.4 


+ 3.2 


— 7.0 


+ 2.9 


0.1 


+ 0.5 


+ 1.5 


— 2.0 


+ 0.1 


0.0 


+ 0.0 


+ 0.0 


— 2.0 


— 2.0 


184.1 


+ 3.3 


+ 9.8 


—222.0 


—24.8 
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where the additional control column s serves as a 
check. 

The subsequent formation of the various sum- 
products and normal equations is shown in the 
following schedules together with the s columns 
as a check. 





aa 


ab 


ac 


ao 


as 


+ 


4,886 


+ 958 


+ 252 


— 9,297 


—3201 


+ 


6,675 


—858 


+ 1,855 


— 4,494 


+ 3178 


+ 


847 


—111 


— 617 


— 669 


— 550 


+ 


11 


+ 11 


+ 11 


— 23 


+ 10 


+ 





+ 


+ 


— 


+ 


+ 





+ 


+ 


— 


+ 



+ 12,419 + +1,501 —14,483 — 563 

bb be bo bs 

+ 188 + 49 — 1,822 — 628 



+110 


— 


238 


— 578 


— 408 


+ 14 


+ 


81 


+ 87 


+ 72 


+ 12 


+ 


11 


— 24 


+ 10 


+ 


+ 





— 1 


+ 


+ 


+ 





— 


+ 


+m" 


— 


96~ 


— 1,182 


— 954 






cc 


CO 


cs 




+ 


13 


— 479 


— 165 




+ 


515 


— 1,249 


+ 883 




+ 


449 


+ 488 


+ 401 




_j_ 


10 


— 22 


+ 9 




4- 


2 


— 3 


+ 1 




+ 
+ 





+ 


+ 




989 


— 1,265 


+ 1129 
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We may now write the normal equations in 
schedule form as follows : 

ORIGINAL NORMAL EQUATIONS 

(a) +12,419 + + 1501 — 14483 

(1) +0+0—0 

(b) + 324 — 96 — 1182 

(2) + 181 — 1750 

(c) + 989 — 1265 

(3) +.00000 +.12086 —1.16617 

The sum-products from the observation equa- 
tions are shown in the rows marked (a) , (b) , (c) . 
The row marked (3) and printed in italics is 
formed by dividing each of the figures in row (a) 
with 12,419. The row marked (1) contains the 
products of the figures in row (a) multiplied with 
the factor .00000. All these products happen in 
this case to be equal to zero. Eow (2) is the 
products of the factor 0.12086 and the figures in 
row (a) . 

We next subtract row (1) from row (b) , row 
(2) from row (c) , which results in the following 
schedule, which is known as the first reduction 
equation. 

FIRST REDUCTION EQUATIONS 

(0) +324 — 96 — 1182 

(1) + 28 + 350 
(b) + 808 + 485 
]2)~~ —.29626 ~ —3764814 
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The above equations are treated in a similar 
manner as the original normal equations, and we 
have therefore the 2nd reduction equation of the 
form : 

SECOND REDUCTION EQUATION 
+ 780 +135 

The solution for the unknown r's may now 
be shown as follows : 

r 4 = — 135 : 780 = —.17308 
r 3 = 3.64814— (—.29626) (—.17308) = 3.59637 
r = 1.16617— (0.0) 3.59637) — (.12086) 
(—.17308) = 1.18709. 

From which we find : — 

k B = 263.5, K=— 89.9, fe 4 = — 5.1 

Applying these factors to the values of 9 («), 
y 3 (z) and <p 4 (2) we obtain the following re- 
sult :— T 



*0?0 


hva 


h9* 


2 ^9i 


Obs 


82.9 


+ 49.2 


—0.6 


131.5 


133 


96.9 


—37.7 


—3.9 


55.3 


55 


34.5 


—13.5 


+3.6 


24.6 


23 


3.8 


+ 12.1 


—0.5 


15.4 


7 


0.1 


+ 1.0 


—0.2 


0.9 


2 


0.0 


+ 0.0 


-0.0 


0.0 


2 



1 For a closer approximation see my Mathematical 
Theory of Probabilities (Second Edition, New York, 1921). 
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is. transforma- While it is always possible to 

TION OF THE n £ i 

variate express all frequency curves by 

an expansion in Hermite poly- 
nomials, the numerical labor when carried on by 
the method of least squares often involves a large 
amount of arithmetical work if we wish to retain 
more than four or five terms of the series. Other 
methods lessening the arithmetical work and ma- 
king the actual calculations comparatively simple 
have been offered by several authors and notably 
by Thiele, who in his works discusses several 
such methods. Among those we may mention the 
method of the so-called free functions and ortho- 
gonal substitution, the method of correlates and 
the adjustment by elements. The chapters on 
these methods in Thiele 's work are among some 
of the most important, but also some of the 
most difficult in the whole theory of observations 
and have not always been understood and appre- 
ciated by the mathematicians, chiefly on account 
of Thiele 's peculiar style of writing. A close study 
of the Danish scholar's investigations is, how- 
ever, well worth while, and Thiele 's work along 
these lines may still in the future become as 
epochmaking in the theory of probability as some 
of the researches of the great Laplace. The 
theory of infinite determinants as used by M. 
Fredholm in the solution of integral equations is 
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another powerful tool which offers great advant- 
ages in the way of rapid calculation. All these 
methods require, however, that the student must 
be thoroughly familiar with the difficult theory 
upon which such methods rest, and they have 
for this reason been omitted in an elementary 
work such as the present treatise. 

We wish, however, to mention another method 
which in the majority of cases will make it pos- 
sible to employ the Gram or Laplacean — Charlier 
curves in cases with extreme skewness or excess. 
We have here reference to the method of logarith- 
mic transformation of the variate, x. 



is. the general One of the simplest trans- 
tr^s¥ormation formations is the previously 
mentioned linear transforma- 
tion of the form z = fix) = ax + b, by which 
we can make two constants, c 1 and c 2 vanish. 
Other transformations suggest themselves, how- 
ever, such as fix) = ax 2 + bx + c, fix) = [/«, 
fix) = logx and so forth. For this reason I pro- 
pose to give a brief development of the general 
method of transformations of the statistical 
variates, mainly following the methods of Charlier 
and J0rgensen. 

Stated in its most general form our problem 
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is : If a frequency curve of a certain variate is 
given by F(x) what will be the frequency curve 
of a certain function of x, say /(a?) ? 

The equation of the frequency curve is y = 
F(x) , which means that F(x)dx is the probability 
that x falls in the interval between x- — \dx and 
* + %dx. The probability that a new variate z 
after the transformation z = f(x) , or x (*0 = #i 
falls in the interval z — \dz and z + ^dz is there- 
fore simply 

F[x(z)]y}(z)dz = F(x)dx, 

which gives in symbolic form the equation of the 
transformed frequency curve. 

The frequency for z = i{x) is of course the 
same as for x. The ordinates of the frequency 
curve, or rather the areas between corresponding 
ordinates, are therefore not changed, but the ab- 
cissa axis is replaced by f(x). Equidistant inter- 
vals of x will therefore not as a rule — except in 
the linear transformation — correspond to equid- 
istant intervals of fix). 

If, for instance, the frequency curve F(x) is 
the Laplacean normal curve 

1 — x?:2o* 

F(x) = —==, e 
<3\/2n 
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and if we let z = f(x) = x 2 or x = ]/z, we have 

1 e 



evidently h __ 2;2(j2 



W = 



oj/2n 2|/z 



8« logarithmic Of the various transformations 
™ ANSFOBMArJOiV the logarithmic is of special 
importance. It happens that 
even if the variate x forms an extremely skew 
frequency distribution its logarithms will be 
nearly normally distributed. 

This fact was already noted by the eminent 
German psychologist, Fechner, and also men- 
tioned by Bruhns in his Kollektivmasslehre. But 
neither Fechner nor Bruhns have given a satis- 
factory theoretical explanation of the transforma- 
tion and have limited themselves to use it as a 
practical rule of thumb. 

Thiele discusses the method under his adjust- 
ment by elements, but in a rather brief manner. 
The first satisfactory theory of logarithmic trans- 
formation seems to have been given first by J0r- 
gensen and later on by Wicksell. 1 ) Jgrgensen 



1 The law of errors, leading to the geometric mean 
as the most probable value of the variate as discovered 
by Prof. Dr. Th. N. Thiele in 1867 may, however, be con- 
sidered as a forerunner of Jgrgensen's work. 



Logarithmic Transformation. 73 

first begins with the transformation of the normal 
Laplacean frequency curve. Letting z = logx and 
bearing in mind that the frequency of x equals 
that of logx we have 

z — f(x) = log x, or x = x(z) = e z and dx = e?dz. 

The continuous power sums or moments of 
the rth order around the lower limit take on 
the form 

=J 1 /log x — «i\ ! 

{n]/'2n)- x N jj afe* l " ' dx = 

u 

+ f _w!=*y 

= (n^2^) _1 iV \ e«e 2 ^ m Vdz. 



on the assumption the logx is normally distrib- 
uted. 

The change in the lower limit in the second 
integral from — 00 to zero arises simply from the 
fact that the logarithm of zero equals minus in- 
finity and the point — 00 is thus by the trans- 
formation moved up to zero. 

By a straightforward transformation we may 
write the above integral as 
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+» 

iV mir + D + ikriHr+iy p — l Wdt 

M r = -=e 



„ T mCr + lJ + '/sM^r+l) 2 

= Ne 



Changing from moments to semi-variants by 
means of the well-known relations 

X = M 
A 1 = M ± :M 

X 2 = (M 2 M -Ml):Ml 

X 3 = (M 3 Ml — 3M 2 M 1 M + 2M\):Ml 

A 4 = (M K M\ — m z M r M\ — 3MIMI + 
+ 12M a M\M — 6M\):M 4 



we have 



tn+'hn' 



A = Ne 

A l — e 



A 2 = e 2m+3n '(e n *-l) 

^ = e «- + e-' (6 *.-_ 4c »»'_ 3e ^ +12< ,.'_ 6)- 
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These equations give the semi-invariants ex- 
pressed in terms of m and n. On the other hand 
if we know the semi-invariants from statistical 
data or are able to determine these semi-invariants 
by a priori reasoning we may find the parameters 
ra and n. 

21. the mathema- A point which we must bear 
in mind is that the above semi- 
invariants on account of the 
transformation are calculated around a zero point 
which corresponds to a fixed lower limit of the 
observations. 

Very often the observations themselves in- 
dicate such a lower limit beyond which the fre- 
quencies of the variate vanish. In the case of 
persons engaged in factory work there is in most 
countries a well-defined legal age limit below 
which it is illegal to employ persons for work. 
Another example is offered in the number of 
alpha particles radiated from certain radioactive 
metals. Since the number of particles radiated 
in a certain interval of time must either be zero 
or a whole positive number it is evident that — 1 
must be the lower limit because we can have no 
negative radiations. Analogous limits exist in the 
age limit for divorces and in the amount of 
moneys assessed in the way of income tax. 
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The lower limit allows, however, of a more 
exact mathematical determination by means of 
the following simple considerations. It is evident 
that this lower limit must fall below the mean 
value of the frequency curve. X/et us suppose that 
it is located at a point, a, located say r\ units in 
negative direction from the mean, M = \ , and 
let us to begin with select \ as the origin of the 
coordinate system in which case the first semi- 
invariant, X 1; is equal to zero. Transferring the 
origin to a the first semi-invariant equals n , while 
the semi-invariants of higher order remain the 
same as before the transformation and we have : 

-. MJ+1.5B 8 

Aj — - a = r\ = e 

A 2 = n 2 (e K ' — 1) or e" ! = l+.\ 2 :n 2 



\l 3X| 

— H 

n 6 n 4 . 



which reduces to X 3 r\ 3 — SAjJn 2 — Xij = 0. 



The solution of this cubic equation which has 
one real and two imaginary roots gives us the 
value of n or \ — a and thus determines the 
mathematical zero or lower limit. We have in 

fact : 



m 



log(l + X 2 :n 2 ) and 
log t) — l.bn 2 , while 



N = \ n :e 



m-^jzn 2 
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22. logarithmic- We have already shown that 

ALLY TRANS- . J 

formed fre- the generalized frequency curve 

QUENCY SERIES & 1 J 

could be written as 



+ .. 



77/ \ / \ ^Wifa) <¥p 2 (x) c a y a (x) 

F(x) = c cp (z) — J^-L + sn^J. — J^J 

where the Laplacean probability function 

— (»— My 
<Po(«) = -77^= e 

is the generating function with M and o as its 
parameters. 

The suggestion now immediately arises to use 
an analogous series in the case of the logarithmic 
transformation. In this case the frequency curve, 
F(x), with a lower limit would be expressed as 
follows : 

F(x) = k % (x) ~jf-+ 2 , - --3'— + • ■ • 

while the generating function now is 

where m and n are the parameters. 
1 n\ = \n. 
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Using the usual definition of semi-invariants 
we then have 

XjCO \ro 2 X 3 a>3 

p Tr + -2T + -3r+---_ c , £i» , ^ , S3C0 3 

5 e — s -t- i! "^ 2! 3! '" 



.3! 

The general term on the right hand side in- 
tegral is of the form 

(— l) s k s :s\l e xco ® s (x)dx 

h 

where the integral may be evaluted by partial 
integration as follows : 

] e x(a <5> a (x)dx = e^O^Or)] — co "$ e x< °<$> s - X (x)dx. 

00 

Since both <& (x) and all its derivatives are 
supposed to vanish for x = and x = 00 the first 
term to the right becomes zero and 

] e m ®.(x) dx= — co J e* 03 ^-! (as) dr. 


By successive integrations we then obtain thp 
following recursion formula 
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(— co) 1 1 e xca <P s - 1 (x)dx = (— to) 2 jj e x( °®^(x)dx 

O 

(_ 03(2 J e x( °<5> s ^ 2 (x) dx = (-co) 3 ] e xa <$> s - S (x)dx 



(— to) 8-1 1 e xw ^(x)dx = (— co) s \ e xw %(x)dx. 



Or finally 

] e XC0 <P s (x)dx = (— to) ! ] e xm %(x)dx. 



Expanding e x<a in a power series we have 
|e a;ro <l> s (a;)da; = 



n\/2n J 



1 + iccoH H + 

2! 3! 



1 r logs— m l* 

~z L » J dx. 



The general term in this expansion is of the 
form 

» 1 rloga;— ml* 

"Zl n J 



(— co) s co r C 
n\/Jn r! J 



afe 



rfa; 
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which according to the formulas given on page 
74 reduces to : 

Hence we may write 

r = as 

]e*°»S> s (x)dx = (-co) 8 V^+WV+DV.,., 

Consequently the relation between the semi- 
invariants and the frequency function 

Fix) = k %(x) - ^ ^(x)+^ 2 (x) - ^ 3 (x)+ . . . 

can be expressed by the following recursion for- 
mula 



\jO> X 2 (0 2 ^3<D 8 

Tr + "2T + ^3T + - ••_ , SjM ^2 SgCQ 3 

1! 2! 3! 



V =s +^ 1 -+^n-+-^r-+-- : 



= \" Sv ^=Y'y l co^ V e m( ' +1)+1/2B2(r+1 V: H 



v = » = r = 



The constants k are here expressed in terms of 
the unadjusted moments or power sums, s. It is 
readily seen that the Sheppard corrections for 
adjusted moments, M, also apply in this case. 
We are, therefore, able to write down the values 
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of the fe's from the above recursion formula in the 
following manner 

M = k Q e m+1 ' m ' 
M 1 = J h e m+llm °+k e* m+2n ' 
M % = k 2 e m+l '* n '+2k 1 e* m+2n '+k e 3m+i - Sn ° 
M a = k 3 e m+lhn '+M 2 e 2m+2n% +Sk ie Zm+ ^ n2 + k e im+Sn! 
M, = k i e m+i, ^+ik 3 e 2m+2n '+Qk 2 e Sm+ ^ + ^k 1 e im+8n ' 
+fe,e 5m+12,5 " ! 

It is easy to see that it is not possible to 
determine the generating function's parameters m 
and n from the observations. These parameters 
like M and o* in the case of the Laplacean normal 
probability curve must be chosen arbitrarily. If 
m and n are selected so as to make k x and k 2 
vanish we have 

M = k e m+ ''^ 



M x = k e' 
M % = k e 



2m+2ri l 
Zm+iAn? 



the solution of which gives 



e 



M M 2 2m _ M\ 



while 
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l^v+'l" = M i -4M 3 e m+1 - 6nl -M e im+9n \e 3n '-4). 

This theory requires the computation of a set 

of tables of the generating function 

i nog x— my 

*> / x i ~ si - s - J 
wj/2n 

and its derivatives. For O (a;) itself we may of 
course use the ordinary tables for the normal 
curve <p (z) when we consider 

log x — m 

z = —2 . 

n 

I have calculated a set of tables of the deriv- 
atives of <E> (a;) and hope to be able to publish the 
manuscript thereof in the second volume of my 
treatise on "The Mathematical Theory of Probab- 
ilities". 

23. parameters The above development is 
Tea^t M squareI based upon the theory of func- 
tions and the theory of definite 
integrals. We shall now see how the same pro- 
blem may be attacked by the method of least 
squares after we have determined by the usual 
method of moments the values of m and n in the 
generating function q> («). 
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Viewed from this point of vantage our problem 
may be stated as follows : 

Given an arbitrary frequency distribution, of 
the variate z with z = (log x — m) : n and where 
x is reckoned from a zero point or origin, which 
is situated a units below the mean and defined by 
the relation 

ri 3 A 3 — 3r) 2 Aa = Ajj, where a = \ ± — r\; 

to develop F(z) into a frequency series of the 
form 

F(z) = k y (z) + k 3 y 3 (z) + /c 4 q> 4 (z) + . . . + kn<? n (z) , 

where the fe's must be determined in such a way 
that the expression 



(r = It, 



faipiiz) 



gives the best approximation to F(z) in the sense 
of the method of least squares. 

Stated in this form the frequency function is 
reduced to the ordinary series of Gram or the A 
type of the Charlier series, already treated in the 
earlier chapters. 



6* 
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24. application As an illustration of the theory 
of a mortality to a practical problem we pre- 
sent the following frequency 
distribution by 5-year age intervals of the number 
of deaths (or Zd s by quinquennial grouping) in 
the recently published American-Canadian Mor- 
tality of Healthy Males, based on a radix of 
100,000 entrants at age 15. 

Frequency Distribution of Deaths by Attained 
Ages in American-Canadian Mortality Table. 



Ages 


Zdx 


1st Component 


2d Comp. 


15— 19 


1,801 


120 


1,681 


20— 24 


1,996 


230 


1,766 


25— 29 


2,089 


440 


1,649 


30— 34 


2,120 


790 


1,330 


35— 39 


2,341 


1,370 


971 


40— 44 


2,911 


2,270 


641 


45— 49 


3,937 


3,570 


367 


50— 54 


5,527 


5,400 


127 


55— 59 


7,723 


7,722 


1 


50— 64 


10,383 


10,383 




65— 69 


12,987 


12,987 




70— 74 


14,535 


14,535 




75— 79 


13,807 


13,807 




80— 84 


10,328 


10,328 




85— 89 


5,464 


5,464 




90— 94 


1,757 


1,757 




95— 99 


278 


278 




100—104 


16 


16 





100,000 91,467 8,533 
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The curve represented by the d x column is 
evidently a composite frequency function com- 
pounded of several series. From a purely mathe- 
matical point of view the compound curve may 
be considered as being generated in an infinite 
number of ways as the summation of separate 
component frequency curves. From the point of 
view of a practical graduation it is, however, easy 
to break this compound death curve up into two 
separate components. A mere glance at the d x 
curve itself suggests a major skew frequency curve 
with a maximum point somewhere in the age 
interval from 70 — 75 and minor curve (practically 
one-sided) for the younger ages. 

Let us therefore break the ~Ld x column up into 
the two so far perfectly arbitrary parts as shown 
in the above table and then try to fit those two 
distributions to logarithmically transformed A 
curves. 

Starting with the first component the straight- 
forward computation of the semi-invariants is 
given in the table below with the provisional mean 
chosen at age 67. 
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Frequency Distribution of Deaths in American 
Mortality Table First Component. 

Ages x ?(i) xF(x) x'F(x) z*F(z) 



04—100 


— 7 


16 


112 


784 


5,488 


99— 95 


— 6 


278 


1,668 


10,008 


60,048 


94— 90 


— 5 


1,757 


8,785 


43,925 


219,625 


89— 85 


— 4 


5,464 


21,856 


87,424 


349,696 


84— 80 


— 3 


10,328 


30,984 


92,952 


278,856 


79— 75 


— 2 


13,807 


27,614 


55,228 


110,456 


74— 70 


— 1 


14,535 


14,535 


14,535 


14,535 


69— 65 


— 


12,987 













59,172 


105,554 


304,856 


1,038,704 


64— 60 


+ 1 


10,383 


10,383 


10,383 


10,383 


59— 55 


+ 2 


7,723 


15,446 


30,892 


61,784 


54— 50 


+ 3 


5,400 


16,200 


48,600 


145,800 


49— 45 


+ 4 


3,570 


14,280 


57,120 


228,480 


44— 40 


+ 5 


2,270 


11,350 


56,750 


283,750 


39— 35 


+ 6 


1,370 


8,220 


49,320 


295,920 


34— 30 


+ 7 


790 


5,530 


38,710 


270,970 


29— 25 


+ 8 


440 


3,520 


28460 


225,280 


24— 20 


+ 9 


230 


2,070 


18,630 


167,670 


19— 15 


+ 10 


120 


1,200 


12,000 


120,000 




32,296 


88,199 


350,565 


1,810,037 



Sr 91,468 —17,355 655,421 771,333 

Computing the semi-invariants by means of 
the usual formulas in paragraph 13, we have : 

\ 1 = —17355:91468 = — 0.18974, or mean at 
age 67 + 5 (0.19) or at age 67.95 
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X 2 = 655421:91468 — A, 2 = 7.1296 

X 3 = 771333:91468 — 3 A^H- 2 A^ = 12.4981. 

In order to determine the mathematical zero 
or the origin we have to solve the following cubic : 

M 3 — 3X 2 2 n 2 = V, or 
12.498 n 3 — 152. 511 n 2 = 362.47 

the positive root of which is equal to 12.39. The 
zero point is therefore found to be situated 12.39 
5-year units from the mean or at age 67.95 + 5 
(12.39), i. e. very nearly at age 130, which we 
henceforth shall select as the origin of the co- 
ordinate system of the first component. We have 
furthermore 

12.39 =e m +i- 5n \ and 7.1296 = e 2m + 3n '(e n '-- 1) = 
= (12.39) 2 (e» 9 — 1), 

the solution of which gives n 2 = 0.04436, n = 
0.2106, m = 2.4504, all on the basis of a 5-year 
interval as unit. If we wish to change to a single 
calendar year unit we must add the natural 
logarithm of 5, or 1.6094, to the above value of m, 
which gives us m = 4.0598, while n remains the 
same. The above computations furnish us with 
the necessary material for the logarithmic trans- 
formation of the variate x which now may be 
written as 
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z = [log (130 — a:) —4.0598] : 0.2106, 

where x is the original variate or the age at death. 
Having thus accomplished the logarithmic 
transformation we may henceforth write the 
generating function as 



*o(*) = 



1_ pog(130 — z) — 4.0598 -I' 
2 L 0.2106 J 



.2106|/2jt 
= <Po(z) = 



271 



We express now F (x) by the following 
equation. 

F{x) = k Q <5> (x) + k s <£> 3 (x) + k^^x) + .... 

or in terms of the transformed z : 

cp(z) = A: cp (z) + A: 3 cp 3 (2) + A; 4 cp 4 (z) + , 

and proceed to determine the numerical values 
of k by the method of least squares. 

The numerical calculation required by this 
method follows precisely along the same lines as 
described in paragraph 17. I shall for this reason 
not reproduce these calculations but limit myself 
to quote the final results for the various co- 
efficients k, which are as follows : — 1 



1 Interested readers may consult the detailed com- 
putations on pages 246—257 in my Mathematical 
Theory of Probabilities (2nd Edition, New York, 
1921. 
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ft = 7361.8; /b s = — 212.2; k A = — 9.6. 

The final equation of the frequency curve of 
the first component F (x) , is therefore : — 

Fi(x) = 7361.8q> (*) — 212.2<p,(z) — 9.6<p 4 (z), 

where the generating function, y a (z), is of the 
form : — 

1 Hog (130 — x ) — 4.0598 -I" 

<Po(z) = <£„(*) = —7= e~ 2 L °- 210 ^^ ~ J 

0.2106)/ 2 jt 

The second component, F n (x) , can by means 
of a similar process be expressed by the equa- 
tion :— 

Fn{x) = 947.4cp (z)— 63.4cp 3 (z)— 30.0cp 4 (z), 
where 

1 Hog (x + 68.8) — 4.532 1' 
1 „ 2 L 0.12 J 



<PoO) = <J>o(*) = 



0.12J/2jt 



Addition of these two component curves gives 
us the ultimate compound frequently curve, 
representing the d x of the mortality table. 

A comparison between the observed values of 
d x and the values of d x as computed from the 
above equation is shown in graphical form in the 
attached diagram. Evidently the graduation leaves 
but little to be desired in the way of closeness 
of fit. 
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Figure 1. 

Diagram showing graduation of d x column in the AM (5) table by a 

compound frequency curve of the Gram-Charlier types. 



25. biological It appears that the Italian 

of mortality statisticians were the first to 
break up the d x curve into a 
system of five or more component frequency 
curves, which, however, were all of the normal 
Laplacean type. Pearson who in a brillant essay 
entitled Chances of Death was the next to attack 
the problem, employed a system of five skew 
frequency curves. Already as early as 1914 I found 
that from ages above 10 the majority of d x 
curves in previously constructed mortality tables 
could be represented by not more than two skew 
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frequency curves as shown in the above example 
of the AM (5) table. 

Although all such investigations may be very 
interesting and useful from the point of view of 
the actuary, we must, however, not overlook the 
fact that the breaking up of the compound d x 
curve in the manner just described is merely an 
empirical process pure and simple. While such 
processes undoubtedly represent very neat methods 
of graduation, a quite different and more im- 
portant question is whether mathematical work 
of this kind allows of a biological interpretation. 
It is evident that from a mere mathematical point 
of view we may break up the d x curve into various 
component parts in an infinite number of ways. 
But while such breaking up processes may be 
extremely interesting as actuarial graduations and 
exercises in pure mathematics, they have evidently 
little connection with the underlying biological 
facts of a mortality table. This aspect of the 
question has been brought out in a very forcible 
manner by the eminent American biometrician, 
Eaymond Pearl, in his 1920 Lowell Institute 
Lectures. The whole subject would appear in a 
quite different light if it were possible to give a 
biological interpretation of the mathematical 
analysis and to show that the component fre- 
quency curves as derived from pure mathematics 
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have a counterpart in actual life. This, I think, 
would be very difficult, if not impossible to 
establish, because it is not mathematics which 
determines the conduct or behavior of living 
organisms. One might, however, view the whole 
problem from the standpoint of the biologist 
rather than from the standpoint of the mathema- 
tican. The problem then is to ascertain whether 
the observed biological facts as shown in the 
collected statistical data allow of a mathematical 
interpretation, rather than to find a biological 
interpretation and counterpart of previously 
established empirical formulae. 

It is to this important question that I have 
devoted the entire discussion of the second chapter 
of this book. I have proceeded from certain 
observed biological facts (in this particular 
instance the statistics on the number of deaths 
by sex and attained ages from more than 150 
causes of death) which represent the natural 
phenomena under investigation. In order to offer 
a rational explanation of these facts and to inter- 
prete their quantitative relationships, I have 
adopted as a working hypothesis the supposition 
that the number of deaths according to attained age 
and sex among the survivors of a homogeneous 
cohort of say 1,000,000 entrants at age 10 tend 
to cluster around specific ages in such a manner 
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that their frequency distribution by attained ages 
can be represented by a limited number of sets 
of Gram-Charlier or Poisson-Charlier frequency 
curves. 

On the basis of this hypothesis we can now 
by simple mathematical deductions construct a 
mortality table from deaths by sex, age and cause 
of death and without any information about the 
lives exposed to risk at various ages. 

Finally we can verify the ultimate results 
contained in this final mortality table by working 
back from the table to the data originally 
observed. 

This procedure is in strict conformity with 
the model of modern science, which according 
to Jevons consists of the four processes of obser- 
vation, hypothesis , deduction and verification. 

The important factor in this investigation, 
and one which most actuaries and statisticians 
fail to grasp, is that I have looked at the whole 
problem as a biometrician rather than as a 
mathematician. Mathematics has been employed 
only as a working tool in the whole process, and 
the reason that the method has met with success 
must be sought for in concrete biological facts 
and not in the realm of mathematics. 
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26. poisson-s I Q certain statistical series it 

P ft?nction Y frequently happens that the 
semi-invariants of higher order 
than zero all are equal, or that 

\ x = X 2 = X 3 = . . . . = X r = X. 

We shall for the present limit our discussion 
to homograde statistical series where the variates 
always are positive and integral, and where there- 
fore the definition of the semi-invariants is of the 
form : — 

Xco Xco 2 Xco s 

e Tr + -2T + ^r H "z<p(a;) = ^y( x )e xm = 
= cp(0)e 0co + <p(l)e lm + cp(2)e 2co + cp(3)e 3ro + ...., 
or 

Xco Xco 2 Xco 8 _\ \,co „ , . xca 

e 
for x = 0, 1, 2, 3, . . ., 

which also can be written as 



Xe m . X 2 e 2co 



e- x (l + — 4 



1! ' 2! 
= 9(0)1 + 9(l)e ro + cp(2)e 2m + 

The coefficient of e TCD gives the relative fre- 
quency or the probabitity for the occurence of 
x = r, and we find therefore that 
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e- x A r 



<f(x) = i|>(r) = -yy 

This is the famous Poisson Exponential, so 
called after the French mathematician, Poisson, 
who first derived this expression in his Recherches 
sur la Probabilites des jugesments, but in an 
entirely different manner than the one we have 
indicated above. 

The Poisson Exponential opens a new way 
for the treatment of statistical series which poss- 
ess the attribute that all their semi-invariants of 
higher order than zero are all equal, or nearly 
equal. It is readily seen that whereas the Lap- 
lacea probability function y (x) contains two 
parameters X x and o the probability function of 
Poisson contains only one parameter, A. 

27. poisson— We have already seen in the 

f , fJAJ}T TDD . 

frequency previous chapters that the 
Gram-Charlier frequency curve 
could be written as 

F{x) = ~Ld(pi(x) = T.aHi(x)(p (x) 
for i=0, 1,2,3, 

where cp (^) is the generating Laplacean proba- 
bility function. 

The idea now immediately suggests itself to 
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use a similar method of expansion in the case of 
the Poisson probability function and to employ 
this exponential as a generating fuction in the 
same manner as the Laplacean function. We are, 
however, in the present case of the Poisson 
exponential dealing with a generating function 
which so far has been defined for positive integral 
values only and, therefore, represents a discrete 
function. Por this reason it will be impossible to 
express the series as the sum-products of the suc- 
cessive derivatives of the generating function and 
their correlated parameters c. We can, however, 
in the case of integral variates express the series 
by means of finite differences and write F(x) as 
follows : 

F{x) = c i\>(x) + c^O) + c 2 A^(» .... (/) 

where ty(x) = er m m x :x! for x = 0, 1, 2, 3, .... , 
and 

Ai{>0) = t\>(x) — ii>(x — 1), 

A 2 i|)(a;) = AiKa:) — A^(a;— l)=i|)(a)— 2\\>(x— 1) 
+ $(x— 2). 

The series (I) is known as the Poisson-Char- 
lier frequency series or Charlier's B type of 
frequency curves. 

The semi-invariants of these frequency series 
are given by the following relation : 
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XjOD + X a CO 2 + X 3 CO 8 + . . . 

~2\ 3T 

e = 

x = 

Expanding and equating the co-efficients 
of equal powers of co we have : 

A = 1 = c S\|) (x) or c = 1 

\ t = Zz (i|> (re) + cA$(x) + c t ^(x) + ..-) (II) 

\ l z + \. 2 = Zx*{ty(x) + cAi\>(x) + cA 2 Mx) + ---) 



We now have 

2i))(j) = 1, and 
Za;i|) (a;) = Im« _m m x ~ x : (x — 1) ! = mZ\|) (x — 1) = m. 

We also find from well-known formulas of the 
calculus of finite differences that 1 

Za) 2 i|)(a;) 
ZxAip(x) = 



1 These formulas can also be derived from the de- 
finition of the semi-invariants and the well-known rela- 
tions between moments and semi-invariants as given on 
page 74 when we remember that according to our de- 
finition all semi-invariants in the Poisson exponential are 
equal to m. 

7 
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ZxA 2 ^(x) = 

~Lx 2 A^(x) = — (2m + 1) 

2,x 2 A 2 i\> (x) = 2 

Substituting these values in (77) we obtain 

X 1 = m — c x 

X x 2 + A 3 = to 2 + m — (2m + 1) c Y + 2c 2 

By letting m = A x we can make the coefficient 
Cj vanish, which results in 

\ ± = m 

c 2 = %[>.;, — -to] 

where the two semi-invariants X x and A 2 are cal- 
culated around the natural zero of the number 
scale as origin. 

For the above discussion we have limited 
ourselves to the determination of the three con- 
stants m, c and c 2 . It is easy, however, to find 
the higher parameters c 3 , c 4 , c 5 , : . . from the 
relations between the moments of the Poisson 
function and the semi-invariants of order 3, 4, 
5, . . . ect. Charlier usually calls the parameter m 
the modulus and c 2 the eccentricity of the B 
curve. 
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28. numerical Xs an illustration of the appli- 

examples cation of the p i sson _charlier 

series we select the following 
series of observations on alpha particles radiated 
from a bar of Polonium as determined by Ruther- 
ford and Geiger. 

The appended table states the number of 
times, F(x), the number of particles given off in 
a long series of intervals, each lasting one-eighth 
of a minute had a given value x : — 

x F(x) x F(x) x F(x) 






57 


5 


408 


10 


10 


1 


203 


6 


273 


11 


4 


2 


383 


7 


130 


12 





3 


525 


8 


45 


13 


1 


4 


532 


9 


27 


14 


1 



We are here dealing with integral variates 
which can assume positive values only and the 
observations are therefore eminently adaptable to 
the treatment by Poisson-Charlier curves. Select- 
ing the natural zero as the origin of the co- 
ordinate system we find that tbe first two semi- 
invariants are of the form 

\ 1 = 3.8754, \ 2 = 3.6257, and we therefore have : 
w = \ 1 = 3.86; c 3 = %i[X 2 — to] = —0.125. 

The equation for the frequency distribution of 
the total N = 2608 elements therefore becomes 

7* 
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F(x) = N[T|),. gg (a;) + (—0.125) 2 ^ 3 . ss (x)~]. 

The table below gives the values as fitted to 
the curve, F(x) : 



Alpha 


Particles 


■ Discharged from Film of Polonium 




(Rutherford and Geiger). 






N = 2608, m = 3.88 


i, c 2 = 


— 0.125 




(i) 


(2) 


(3) 


(4) 


(5) 


(6) 


X 


M*) 


A 2 i|>M 


NX (2) 


i^X(3)Xc 2 


(*) + (5) 





.020668 


+ .020668 


53.9 


— 6.7 


47 


1 


.080156 


+ .038820 


209.0 


—12.7 


196 


2 


.155455 


+ .015811 


405.4 


— 5.2 


400 


3 


.201015 


—.029793 


524.2 


+ 9.7 


533 


4 


.194967 


—.051608 


508.5 


+ 16.8 


525 


5 


.151625 


—.037654 


394.5 


+ 12.3 


407 


G 


.097850 


—.009714 


254.9 


+ 3.2 


258 


r 


.054249 


+ .009814 


141.2 


— 3.2 


138 


8 


.026316 


+.015668 


68.7 


— 5.1 


64 


9 


.011351 


+ .012968 


29.6 


— 4.2 


25 


10 


.004407 


+ .008021 


11.5 


— 2.6 


9 


11 


.001555 


+ .004092 


4.1 


— 1.2 


3 


12 


.000503 


+ .001800 


1.3 


— 0.6 


1 


13 


.000150 


+ .000699 


0.4 


— 0.2 





14 


.000042 


+ .000245 


0.1 


— 0.1 





15 


.000010 


+ .000076 


0.0 


— 0.0 





16 


.000003 


+ .000025 











17 


.000001 


+ .000005 












As a second example we offer our old friend, 
the distribution of flower petals in Ranunculus 
Bulbosus. Selecting the zero point at x = 5 and 
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computing the semi-invariants in the usual 
manner we obtain the following equation for the 
frequency curve. 

F(x) = 222 ^>(x) + 31.5A 2 iMaO, m = 0.631. 

A comparison between calculated and observed 
values follows : — 

x F (x) Obs. 

5 134.9 133 

6 51.6 55 

7 22.5 23 

8 9.5 7 

9 2.9 2 
10 0.6 2 



29. trans- For integral variates we have 

F thevariat£ shown that the Poisson fre- 
quency curve possesses the im- 
portant property that all its semi-invariants are 
equal. Now while a frequency distribution of a 
certain integral variate, x, may perhaps not 
possess this property, it may, however, very well 
happen after a suitable linear transformation has 
been made, that the variate thus transformed will 
be subject to the laws of Poisson 's function. 

Let z = ax — b represent the linear trans- 
formation which is subject to the above laws with 
a series of semi-invariants all equal to m. 
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These semi-invariants according to the pro- 
perties set forth in paragraph 5 are therefore 

m = X x (z) = a\ 1 (x) — b 
m = X 2 (z) = a 2 \. 2 (x) 
m = X 3 (z) = a?\ 3 (x) 



and our problem is to find the unknown para- 
meters a, b and m. 

Simple algebraic methods, which it will not 
be necessary to dwell upon, give the following 
results : 

a = X 2 :X 3 

m = X 2 3 :X 3 2 

b = aX 2 — m 

As a numerical illustration of this trans- 
formation we choose from J0rgensen a series of 
observations by Davenport on the frequency 
distribution of glands in the right foreleg of 2000 
female swine. 

No. of Glands.. 01 2 3 4 5 6789 10 
Frequency 15 209 365 482 414 277 134 72 22 8 2 

The values of the three first semi-invariants are 
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\ = 3.501, X 2 = 2.825, \ 3 = 2.417, 

o = 2.825:2.417 = 1.168, 

m = 2.825 3 : 2.417 2 = 3.859, 

b = (1.168) (3.501) —3.859 = 0.230. 

The new variable then becomes z — az — b 
and the transformed Poisson probablity function 
takes on the form : 

i|)(z) = 



A 



In general, however, we will find that z is not 
a whole number and the expression z ! therefore 
has no meaning from the point of view of 
factorials at least. This difficulty may, however, 
be overcome through the introduction of the well- 
known Gamma Function, T(z + 1), which holds 
true for any positive or negative real value of z 
and which in the case of integral values of z 
reduces to Y(z + 1) = z ! 

Hence we can write the transformed Poisson 
probability function as 

, . e- m m z 

^ = f(^+T)- 

Tables to 7 decimal places of the Gamma 
Function, or rather for the expression — r (z + 1) , 
have been computed by Jorgensen in his Frekvens- 
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flader and. Korrelation from z = — 5 to z = 15, 
progressing by intervals of 0.01. 

By means of this table and the tables of 
ordinary logarithms it is now easy to find the 
values of i|> (z) in the case of the example relating 
to the number of glands in female swine. The 
detailed computation is shown below. 1 

(1) (2) (3) 

x z r( z +i) 

—.230 .9209 

1 +.938 .0108 

2 2.106 .6555 

3 3.274 .0679 

4 4.442 .3216 

5 5.610 .4547 

6 6.778 .4904 

7 7.946 .4446 

8 9.114 .3285 

9 10.282 .1506 
10 11.450 .9177 



« 


(5) 


(6) 


(7) 


log m? 


(3) + (4) 
+ loge— m 


*W 


F(x) 


.8651 


.1101—2 


.0129 


30.1 


.5500 


.8849—2 


.0767 


179.2 


.2350 


.2146—1 


.1639 


382.9 


.9199 


.3119—1 


.2051 


479.1 


.6048 


.2501—1 


.1780 


415.8 


.2897 


.0685—1 


.1171 


273.6 


.9746 


.7891—2 


.0615 


143.7 


.6595 


.4282—2 


.0268 


62.6 


.3444 


.9970—3 


.0099 


23.1 


'.0294 


.5041—3 


.0032 


7.5 


.7143 


.9561—4 


.0009 


2.1 



1 The characteristics of the logarithms have been 
omitted in this table (except in column 5) and only the 
positive mantissas are shown. Column 7 represents the 
2000 individual observations pro rated according to 
column 6. 



CHAPTER II 

(TRANSLATED BY MR. VIGFUSSON) 



THE HUMAN DEATH CURVE 

In the following paragraphs I 

1. INTRODUCTORY & r & Jr- 

remarks intend to discuss a method of 

constructing mortality tables 
from mortuary records by sex, age and cause of 
death, but without reference to or knowledge of 
the exposed to risk at various ages. This proposed 
method is indeed one which has been severely 
criticized in certain quarters, and. several critics 
flatly deny that it is possible to construct morta- 
lity tables from such data without detailed infor- 
mation of the exposed to risk. It is, however, a 
very dangerous practice to say that a certain thing 
is impossible. The true scientist, least of all, 
should attempt to set limits for the extension of 
human knowledge. It is still remembered how the 
great August Comte once denied that it ever 
would be possible to determine the chemical con- 
stituents of the celestial bodies. Only a few years 
after this emphatic denial by the brilliant French- 
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man the spectroscope was discovered, by means of 
which we have been able to detect a number of 
chemical elements of other worlds than that of 
our own little earth. It is but fair to say that the 
method which we here shall describe has met with 
rather determined opposition in certain actuarial 
quarters. Under such circumstances it is natural 
that the process will be viewed in a light of scep- 
ticism and criticism. I welcome such an attitude 
because it has been my purpose to present the 
following studies for further investigation and not 
to force them upon my readers as authoritative 
or as a kind of infallible dogma. 

In presenting the outlines of the proposed 
method I wish to state that it has never been the 
intention to supplant the orthodox methods of 
constructing mortality tables where we have ex- 
act information of the so-called "exposed to risk" 
or number living at various ages. Numerous and 
very important examples, however, offer them- 
selves in actuarial and statistical practice where 
such information is not available. Most of the 
greater American Life Insurance Companies, 
especially those writing the so-called industrial 
insurance, have on hand an enormous amount of 
information of deaths by sex, attained age and by 
cause of death among their policyholders. Even 
the mortuary records of certain occupations, as 
for instance metal and coal miners, among the 
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death claims in the industial class are so numer- 
ous, that it would be possible to construct a mor- 
tality table for such professions if we know the 
exact number exposed to risk at various ages. 
Such information is, however, in the majority of 
cases wanting, or could only be obtained by means 
of a great expenditure of time and labor. Again, 
as Mr. P. S. Crum has pointed out in an article 
in the "Insurance and Commercial Magazine", a 
number of cities and states in United States give 
from year to year very detailed information in 
regard to mortuary records by sex, age at death 
and cause of death. On account of the intense 
migration taking place in certain sections of the 
United States, especially in those of an industrial 
character, it is, however, impossible to know the 
exact population at various ages, except in the 
particular years in which the federal or state 
census has been taken. The fact that for all but 
a few states of this country the intercensal period 
is no less than ten years, the determination of the 
population composition by age and sex for a given 
locality and intercensal year, with any degree of 
accuracy, becomes a practical impossibility without 
a special count. Such a count or census of a 
specific locality or a single city is, however, a 
costly undertaking at its best, for which the nec- 
essary funds are rarely available. In all such 
instances the mortuary records are practically 
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worthless in so far as the construction and com- 
putation of death rates are concerned, if we are 
to rely solely upon the usual method of construct- 
ing mortality tables. It will therefore readily be 
seen that, apart from purely academic interests, 
the possibility of establishing a method of con- 
structing mortality tables without knowing the 
population exposed to risk at various ages would 
be of great practical value, and I deem no apology 
necessary to present the following method, which 
intends to overcome this very obstacle of having 
no information of the exposures. 

2. empirical and In order to bring the method 

INDUCTIVE ME- ■ , ,1 .• -. 

thods of solu- mto the proper perspective it 
will be of value to contrast it 
with the ordinary methods followed in the con- 
struction of mortality tables. Let us therefore 
briefly review'those methods and principles com- 
monly employed by actuaries and statisticians. A 
certain number, say L persons at age x, are kept 
under observation for a full calendar year and the 
number, D T , who die among the original entrants 
during the same year are recorded. The ratio 
D x : L x is then considered as the crude probabi- 
lity of dying at age x. Similar crude rates are ob- 
tained for all other ages and are then subjected to 
a more or less empirical process of graduation to 
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smooth out the irregularities arising from what is 
considered as random sampling. One then chooses 
an arbitrary radix, say for instance 100,000 per- 
sons at age 10, which represents a hypothetical 
cohort of 10-year old children entering under our 
observation. This radix is then multiplied by the 
previously constructed value of q and the product 
represents the number dying at age 10. This 
number, d 10 , is subtracted from l 10 or 100,000 and 
the difference is the number living at age 11 or 
Z„. This latter number is then multiplied by q xl 
and the result is d 117 or the number dying at age 
11 out of the original cohort of 100,000. In this 
way one continues for all ages up to 105, or so. 

It is to be noted that the column of q x in this 
process represents the fundamental column while 
the columns of l x and d r are purely auxiliary 
columns. 

Allow us here to ask a simple question. Do 
these empirically derived numbers of deaths at 
various ages out of an original cohort of 100,000 
entrants at age 10 give us any insight or clue as 
to the exact nature of the biological phenomenon 
known as death, and are we by this method enab- 
led to lift the veil and trace the numerous causes 
which must have been at work and served to pro- 
duce the total effect, the d r curve, of which we 
by means of the usual methods have a purely 
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empirical representation? I fear that this question 
will have to be answered in the negative. The 
usual actuarial methods do not give us a single 
glance into the relation between cause and effect, 
which after all is the ultimate object of investiga- 
tion for all real science. Probably some critics 
would answer that they are not interested in in- 
vestigating causal relations. Such an attitude of 
indifference is, however, very dangerous for a sta- 
tistician or an actuary whose very work rests upon 
the validity of the law of causality. We may, 
however, overlook this apparent inconsistency of 
the empiricists and turn our attention to the pro- 
posed methods of constructing mortality tables- 
along inductive lines, or by the process which 
Jevons has termed a complete induction. 

Such a process we should find diametrically 
opposite to the methods of the empiricists, both in 
respect to points of attack and deduction. In the 
case of the empiricists the q r . is the initial and 
fundamental function from which the d x column 
is computed as a mere by-product. The rationalistic 
method starts with the d column and terminates 
with the q x as the by-product. 

Being primarily interested in the absolute 
number of deaths and not in the relative frequen- 
cies of deaths at various ages, our first question 
is therefore, "What is the form of the frequency 
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curve representing the deaths at various ages 
among the survivors of the original group of 
100,000 entrants at age 10?" Right here we can, 
strange to say, apply some purely a priori know- 
ledge. We know a priori that the curve must be 
finite in extent, because of the very fact that there 
is a definite limit to human life, and we also know 
that it assumes only positive values. There can be 
no negative numbers of deaths unless we were to 
regard the reported theological miracles of resur- 
rections from the Jewish- Christian religion as 
such. This information about the death curve, or 
the curve of d , is, however, not sufficient for use 
as a basis for our deductions. We must therefore 
look about for additional information, whether of 
an a priori or an a posteriori nature and of such 
a general character that it can be adopted as a 
hypothesis. 

It was Poincare who once said 

3. GENERAL PRO- , . , . . 

perties of the that every generalization is a 

"DEATH CURVE" / to 

hypothesis. Hence we shall 
look for some general characteristics which all 
mortality tables have in common in the age 
interval under consideration (age 10 and up- 
wards) . Let us take any mortality table, I do 
not care from what part of the world, and 
examine the general trend of the curve traced 
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by the values of d for various ages. The curve 
rises gradually from the age of ten. The increase 
in the number of deaths among the survivors at 
various ages will increase, although not uniformly, 
until the ages around 70 or 75 are reached. At this 
age interval we generally encounter a maximum. 
From the ages between 70 and 75 and for higher 
ages the number of deaths among the survivors 
will decrease at a more rapid rate than at the 
earlier stages of life. After the age of 85 only a 
small number of the veteran cohort are still alive. 
After the age of 90 only a few centenarians 
struggle along, keeping up a hopeless fight with 
the grim reaper, Death, until eventually all are 
carried off between the ages of 110 and 115. We 
can much better illustrate this process of the 
struggle between the surviving members at va- 
rious ages of the cohort and the opposing forces as 
marshalled by the ultimate victor, Death, through 
a graphical representation. The chart on page 114 
shows a mortality graph of the male population 
in Denmark (1906-1910) from ages 10 and up- 
wards as constructed by the Royal Danish Stati- 
stical Bureau. The ordinates of the curve show 
the number of deaths at various ages among the 
survivors of the original cohort of 100,000 entrants 
at agelO. We notice a gradual increase from the 
younger ages until the age of 77, where a max- 
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imum or high crest is encountered. From that age 
a rapid decline takes place until the curve ap- 
proaches the abscissa with a strongly marked 
asymptotic tendency after the age of 90. At the 
age of 110 all the members of the cohort have lost 
out and death stands as the undisputed victor, a 
victor among a mass of graves. The curve we thus 
have traced may properly be called "The Curve of 
Death". On the same chart I have also shown 
a graphical representation of a comparison between 
the Danish death curve and the corresponding 
death curves of males for England and Wales in 
the period 1909—1911, Norway 1900—1910, 
France 1908—1913 and United States period 
1909 — 1911, all based upon an original radix of 
1,000,000 entrants at age 10. 

We will notice quite important variations in 
these curves. The curves for the Scandinavian 
countries show a relatively heavy clustering around 
the maximum point which in the case of Den- 
mark is reached at age 75, in England at age 73, 
and in France at age 72. The Danish curve is also 
more symmetrical and shows a more uniform clu- 
stering tendency around the maximum value than 
the other curves. The asymmetry or skewness is 
most pronounced in the American curve, due to 
the comparatively greater number of deaths at 
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younger ages than in the other tables. Tn the 
curve for Norwegian males I rnight mention 
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another peculiarity which is absent in most other 
death curves. I have reference here to a secondary 
minor maximum or miniature crest at the age of 
21. This maximum point, which is not very pro- 
nounced arises from the heavy mortality among 
youths in Norway, whose male population always 
has consisted of rovers of the sea. A much larger 
proportion of young men braves the terrors of the 
sea in Norway than in any country in the world. 
These sturdy decendents of the Vikings can be 
found in all parts of the globe. You are sure to 
find a weatherbeaten Norwegian tramp steamer 
even in the most deserted and far away harbours 
of our continents. But the sea takes its toll. The 
result is shown in the little peak in the curve of 
death among these sturdy Norwegian youths. 1 

Despite all these smaller irregularities all the 
curves have, however, certain well defined charac- 
teristics , namely : 

1) An initial increase with age. 

2) A well defined maximum point around the 
age period 70 — 80. 

2) A more rapid decline from that point until 
the ultimate end of the mortality table. 



1 Another factor is the high number of deaths from 
tuberculosis typical of youth. See in this connexion dis- 
cussion in paragraph 12 a under the Japanese Table. 
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The most interesting of these 

4. RELATION OF . . . . 

frequency c o m m o n characteristics is 

CURVES 

the encountering or a maxi- 
mum point in the neighborhood of 70, and the 
subsequent decline toward the higher ages. This 
fact has a very important biometric significance, 
which we shall discuss in a somewhat detailed 
manner. Most of my readers are familiar with the 
so-called probability curve, expressed by the 
equation : 

This Laplacean or normal curve is represented in 
graphical form by the beautiful bellshaped curve 
so well known to mathematical readers. Various 
approximations to this curve are continually en- 
countered in numerous instances of observations 
relating to certain biological phenomena where 
certain measurable attributes of various sample 
populations tend to cluster around a certain norm, 
such as the measurements of heights of recruits, 
fin rays in fish, etc. We also know that where this 
tendency to cluster around the mean is asymmetri- 
cal or skew, it is in many cases possible to give 
a very close representation by the Laplacean- 
Charlier frequency curves. 

Now let us return to our curves of death. It 
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will be noted that all these curves for ages above 
the crest period 70 to 75 to a very marked degree 
approach the form of the normal probability curve 
and exhibit a marked clustering tendency around 
this particular period. The ages around 70, the 
Bible's "three score and ten", can therefore be 
looked upon as a norm of life around which the 
deaths of the original cohort group themselves 
in more or less correspondence with the binomial 
probability law. This pronounced grouping ten- 
dency is a very significant biological phenomenon, 
which it might be of interest to dwell upon. 

If all the members of our original cohort were 
identical as to physical constitution and characte- 
ristics, if they all were exposed to. identically the 
same outward influences acting upon their mode 
of life, it becomes evident from the law of causa- 
lity, which is the basis and justification of every 
collection of statistical data, that all members 
would die at the same moment. We see, however, 
immediately that such hypothetical conditions are 
not present in human society. The paramount 
feature of our material world is variation. No two 
persons are alike in regard to physical constitu- 
tion. Certain inherited characteristics, which are 
present in the individual in more or less pronoun- 
ced form, make themselves felt. No two persons 
or group of persons can be said to be exposed to 
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the same outward influences. The clergyman and 
college professor living a sort of tranquil and 
sheltered life are not exposed to the same dangers 
as the working man or the man in business life. 
All these and other factors, almost infinite in 
number, tend to produce a decided variation in 
the actual duration of life. Of these influencing 
factors those relating to purely inherited or na- 
tural characteristics are without doubt the most 
powerful. If it were possible to eliminate certain 
forms of deaths due to infectious diseases, tuber- 
culosis and accidents, causes more or less due to 
outward influences, we should have left a number 
of causes due to a gradual wearing out of the 
human system, similar in many respects to the 
deterioration of the mechanism in ordinary ma- 
chinery. The death curve from such causes of death 
would be more related to the normal curve than 
the death curve which includes causes of death 
from non-inherent or anterior causes as menti- 
oned above. This statement is borne out in the 
shape of the Danish death curve. In Denmark 
where a very determined and largely successful 
fight has been carried on against tuberculosis, and 
where the accident rate is very low we also find 
that the curve is more symmetrical than for in- 
stance in this country or in England. 

This tendency to an approach towards the bi- 
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normal probability curve was already noted by 
Lexis, who from such considerations tried to de- 
termine what he called a "Normalalter" or normal 
age for various countries and sample populations. 
Speaking of this attempt the eminent Danish sta- 
tistician, Harald Westergaard, says in his „Sta- 
tistikens Teori i Grundrids" (Copenhagen 1916) 
"An unsually interesting attempt has been made 
by Lexis to determine the normal age of man. 
A mortality table will, as a rule, have two 
strongly dominant maximum points for the num- 
ber of deaths. During the first year of life there 
dies a comparatively large number. From the age 
of 1 the number of deaths decreases and reaches 
its lowest point in early youth. It then again 
begins to increase, at times in wavelike motions, 
until the maximum point is reached at the old 
age period". 

"The clustering around the latter point has 
now a great likeness with the normal or Gaussian 
curve, and we might for this reason call this 
specific age the normal life age. For the cal- 
culation of such a normal age the argument may 
be put forth that experience shows that the great 
variations in mortality tend to disappear in old 
age. Let the rate of mortality in a certain gene- 
ration at age .r be \x x and the number of the cor- 
responding survivors be l x . The quantity \x x l x will 
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then increase from a certain point, while l x de- 
creases, in the beginning slowly, but later on at a 
more rapid pace. "During a long period of life the 
quantity \i x l x — the number of deaths at a certain 
age — -will increase with age. Later on a reversed 
motion takes place. But when this reversion will 
occur depends on many conditions, the successful 
fight against certain diseases, progress in econo- 
mic conditions, or change in the mode of living. 
All this exercises an important influence, and the 
maximum point occurs therefore sometimes sooner 
and sometimes later. It is also important to in- 
vestigate the natural selection in old age, which 
so to say divides the population in different strata, 
each with its own state of health. The healthiest 
of such groups will with the increase in age play 
a greater role. Here as everywhere it is the more 
important problem to study the clustering around 
the mean inside the special groups rather than to 
attempt to find a derived expression for the morta- 
lity. On the other hand, the correspondence be- 
tween the normal curve as established by Lexis 
is another testimony to the fact that this curve 
or formula very often can be applied, even in 
complicated expressions". 
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s. the "death Lexis was satisfied to deter- 

CVRVE" AS A ,, , a 

compound mine the normal age. A more 

ambitious attempt to investi- 
gate the mortality by means of frequency curves 
throughout the whole period of life was made by 
the eminent English biometrician, Pearson, in a 
brilliant essay in his "Chances of Death". Pear- 
son took the number of deaths in the English 
Life Table No. 4 (males) and succeeded in break- 
ing up the compound curve into five component 
curves typical of old age, middle age, youth, child- 
hood and infancy. I want to advise my readers to 
study this brilliant and illuminating essay, especi- 
ally on account of its beautiful form of exposition 
which makes the whole subject appear in a most 
interesting light. 

Speaking of this attempt by Pearson, the 
American actuary, Henderson, is of the opinion 
that „the method has not, however, been applied 
to other tables and it is difficult to lay a firm 
foundation for it, because no analysis of the deaths 
into natural divisions by causes or otherwise has 
yet been made such that the totals in the various 
groups would conform to these (the Pearson) 
frequency curves". We shall later on come back 
to this statement by Henderson, which we feel 
is a partial truth only. On the other hand, it must 
be admitted that the system of Pearson's types of 



122 Human Death Curves. 

skew frequency curves (by this time twelve in 
number) are by no means easy to handle in 
practical work and often require a large amount 
of arithmetical calculation. Moreover, there seems 
to be no rigorous philosophical foundation for the 
Pearsonian types of curves, and they can at their 
best only be said to be exceedingly powerful and 
neat instruments of graduation or interpolation. 

On the other hand, I am of the opinion that 
the goal can be reached more easily if we, instead 
of the Pearsonian curve types, make use of the 
Laplacean-Charlier andPoisson-Charlier frequency 
curves, which are expressed in infinite series of 
the form : 

F(x) = q ,(ar) + p 8 q,in( a: ) + p 4 q,iv( a . )+ ..: ( 2 ) 

or2f(s)=iKaj) + Y I A»iMs) + Y,A»iMa!) + ....(3) 

These two curve types have been treated 
elsewhere by Gram, Charlier, Thiele, Bdgeworth 
J0rgensen, Guldberg and other investigators, and 
it is therefore not necessary to dwell further upon 
their analytical properties, which were discussed 
in Chapter I. 

Eeturning now to the general form of our d x 
curve of the mortality table which we discussed 
above, it is readily seen that this curve has all the 
properties of a compound frequency curve, that 
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is, a curve which is composed of several minor or 
subsidiary frequency curves, generally skew in 
appearance. As proven both by Charlier and by 
J0rgensen, any single valued and positive comp- 
ound frequency curve vanishing at both -\- oo and 
— cc can be represented as the sum of Laplacean- 
Charlier and Poisson-Charlier frequency curves. 
We know thus a priori that the d x curve is comp- 
ounded of the two types of frequency curves. But 
how are we to determine the separate component 
curves? It is readily admitted that no a priori 
reason will guide us here. The purely empirical 
observer might therefore abandon the project 
right here, because to all appearances it would 
seem hopeless to attempt a solution by purely 
empirical means. The positive rationalist does 
not despair so easily. "Very well", he says, "if 
we can not make further progress by purely 
empirical means, we are at least permitted to try 
deductive reasoning and attempt to bridge the gap 
by means of an hypothesis". The hypothesis I 
shall adopt is the following : 

The frequency distribution of deaths ac- 
cording to age from certain groups of causes 
of death among the survivors in a mortality 
table tend to cluster around certain ages in 
such a manner that the frequency distribution 
can be represented by either a Laplacean- 
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Charlier or a Poisson- Charlier frequency 
curve. 

A study of mortuary records by age and cause 
of death immediately supports this hypothesis. 
We notice, for instance, that diseases such as 
scarlet fever, .measles, whooping cough and diphr 
theria often cause death among children, but 
rarely seem to affect older people. We know, for 
instance, that there is a much greater probability 
that a 5-year old boy will die from scarlet fever 
than a man at the age of 40 wiill die from the 
same disease. On the other hand, there is quite 
a large probability that an old man at age 85 
will die from diseases of the prostate gland, while 
such an occurrance is almost unheard of among 
boys. Similarly deaths from cancer and Bright's 
disease are very rare in youth, but quite frequent 
in early old age. Tuberculosis, on the other hand, 
causes its greatest ravages in middle life, and has 
but little effect upon older ages. 



6. mathematical Leaving, however, the ques- 

PROPERTIES OF ° ^ 

nIntfreq P uen- tl0n 0f the 8 T0U P in g of causes 
cy curves of death into a limited num- 
ber of typical groups to a later discussion, we shall 
in the meantime see how the hypothesis can carry 
us over the difficulties. Let us for the moment 
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assume that we are able to group the causes of 
death into say 7 or 8 groups. We shall also as- 
sume that we know the percentage frequency 
distribution of deaths according to age in each 
of the groups. This means in other words that 
we know the equation of the frequency curves 
giving the percentage distribution. Let the ana- 
lytical expression for these frequency curves be 
denoted by the symbols : 

Fj(x), F a {x), F m {x), ..., .Fviii(z). (4) 

Again, let the total number of deaths among the 
survivors in the mortality table from causes of 
death according to the above grouping be denoted 

by 

N u Nu, Niu, Nix, . . ., Nviu respectively. (5) 

The number of deaths in a certain age interval, 
say between 50-54 can then be expressed as 
follows : 

x = bi 



^d x =^N Fi (x) +^N U F n {z)-\-.. 



X = 50 60 

54 



+ y,^ 



vmFy\ii{x). 



(6) 



In this relation the only known quantities are 
the equations for the frequency curves Fi{x), 
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Fa(x), . . ., Fvm(x, of the percentage frequency 
distribution according to age in each of the eight 
groups. Neither d x nor any of the various N's are 
known. The only relation we know a priori among 
the quantities N is the following : 

JV, + N u + N m + ■ ■ • JVvm = 1 ,000,000. (7) 

The latter equation is simply a mathematical 
expression for the simple fact that the sum total 
of the sub-totals of the various groups of causes 
of death, in other words the deaths from all 
causes among the survivors in the mortality table, 
must equal the radix of the entrants of our orig- 
inal cohort of 1,000,000 lives at age 10. Viewed 
strictly from the standpoint of frequency curves, 
we might express the same fact by saying that 
the sum of the areas of the various component 
curves must equal 1,000,000. 

It is readily seen that on the assumption that 
the expressions of the different F(x) conform to 
the above hypothesis it is possible to find d for 
any age or age interval if we can determine the 
values of the different N's. It is in this possibility 
that the importance of the proposed method lies, 
and we shall now show how it is possible to deter- 
mine the N's without knowing the exposed to 
risk. 
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r. observation Consider for the moment the 

EQUATIONS 

following expression : 



50 III 



JTiVm Fm (a) 



£Ni Ft (x) +£Fn (x) Kn + 

50 51) 

54 54 

-^Vin Fm (x) + . . . +^^111 jPViii (a) 



(8) 



What does this equation represent? Simply the 
proportionate ratio of deaths in group III to the 
total number of deaths in all type groups (in 
other Words the deaths from all causes) in the age 
interval 50-54. Such ratios are usually known as 
proportional death ratios. It is readily seen that 
these proportionate death ratios are dependent on 
the deaths alone and absolutely independent of 
the number exposed to risk, provided tne total 
number of deaths from all causes in a certain age 
group is large enough to eliminate variations due 
to random sampling. 1 In other words, we can find 



1 Strictly speaking this statement is only true for an 
age interval of one year or less and may in the case of 
large perturbing influences in the population exposed to- 
risk be subject to appreciable errors when we use large 
age intervals of 10 or more in our grouping for the com- 
puting of R{x). When the age interval for the grouping 
of causes of deaths by attained ages is 5 years or less 
the error committed in assuming R(x) as being indepen- 
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a numerical value for the term JB rir (x) on the left 
side of the equation from our death records alone 
without reference to the exposed to risk in this 
interval. Similar proportionate death ratios can 
of course without difficulty be determined for the 
other groups of causes of death and for arbitrary 
ages or age intervals. In this manner we can 
determine a system of observation equations with 
known numerical values of .R. (#)(& = I, II, III, . . .) 
The fact that the number of observation equations 
in this system is much larger than the number of 
the unknown N's makes it possible to determine 
these unknowns by the method of least squares. 

Probably the simplest manner is first to deter- 
mine by simple approximation methods, or by 
mere inspection, approximate values for the 
various N's and then make final adjustments by 
the method of least squares. 

Let, for instance, 



'JVi, 'N n , 'N } 



nil 



dent of the number exposed to risk is in most cases 
negligible. One of the difficulties encountered in the 
construction of a mortality table for Massachusetts Males 
was that the age interval used for the grouping was 10 
years instead of 5 years or less. See in this connection 
the remarks at the beginning of paragraph 11 and at 
the conclusion of paragraph 16 of the present chapter. 
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be the first approximations of the areas of the 
various groups of frequency curves so that 



#, = <#!, N n =a,'N n , 
-tfvm = a 8 'JVvm. 



-} 



(9) 



Let us furthermore introduce the following 
symbols : 



1 



(10) 



'JVi Fj (x) = <&! (x) , 'N n F a (x) = <D 2 (fc) , 
'N Y inF vm {x) = <£> a (x). 

The different values of 

®i(«). ® 2 (*). *s(*). ••-, ^ 8 (*) 

may then be regarded as a system of component 
frequency curves to which we now must apply the 
different correction factors c^, a 2 , a 3 , . . . , a 8 in order 
to fit the curves to the observed proportional death 
ratios, R(x), for the various groups of typical 
causes of death. Let us for example assume that 
the observed death ratio of a certain age (or age 
group), x, under a certain group of causes of 
death, say group No. Ill, is Rm(x). We have 
then the following observation equation : 

B m (x) = a s ® 3 (x): [a^W+a^W-U 
+ a t <J> 4 (z)+. • .+a a ® 8 (x) + a 2 <P 2 (x)} } (U) 
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Since the sum of the areas of the different comp- 
onent curves necessarily must equal 1,000,000 it 
is easy to see that we may write the factor a 2 
in the last term of the denominator in the follow- 
ing form : 

a, Y O 2 0) = 1,000,000 



or 



1,000,000- [a^X^+ag^ 7 <D 3 (z)- 

... + a 8 ^ '<D 8 (x)]) : JT<D, (x) = 
= h — [h « x + /j a ;i + . . . + /> 8 a 8 J 



where 



1 ,000,000 _ Z p! (x) 

~ ! I$ 2 (l) ' J ~~ I$ s (l)' 

1 " s$,(i)' '•■' 8 KD 2 (i)' 



(12) 



The expression for i?m (a;) can then be put in the 
following form : 

Bm {x) = a 3 O s (a;) : [c^ ^ (x) + a ;} <J> 3 (x) + ' 

+ a i * i (x) + ....+a 8 <l> s (x)+ /(IS) 
+ (*t> — *! a x — . . . — /.•„ a 8 ) <D 2 (a;)] . . 
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Similar observation equations for the other 
groups are derived without difficulty. 

Once having formed the observation equations 
it is simply a matter of routine work to compute 
the normal equations from which the values- of 
the unknown N's can be found. We shall, how- 
ever, not go into detail with the derivation of the 
necessary formulas, since this is a process which 
belongs wholly to the domain of the theory of 
least squares and which has received adequate 
treatment elsewhere. (See for instance Brunt's 
Combination of Observations.) 

s. classifica- We think it more advantage- 
TI °oF°DEATi ES ous to illustrate the method by 
a concrete example. As an 
illustration we may take the case of Michi- 
gan Males in the period 1909—1915. The 
mortuary records of Males in Michigan are 
for that period given in the reports issued 
annually by the Secretary of State on "Begistrat- 
ion of Births and Deaths, Marriages and Divorces 
in Michigan". The deaths by sex, age and cause 
of death are given in quinquennial age groups. A 
very serious drawback is the grouping of all ages 
above 80 into a single age group instead of in at 
least 4 or 5 quinquennial age groups. This makes 
it impossible to obtain good observation equations 

9* 
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for ages above 80. When we consider that about 
one fifth of the original entrants at age 10 in the 
mortality table die after the age of 80, it is readily 
seen that this defect in the Michigan data is of a 
very serious character, which makes it out of the 
question to determine correctly the areas of the 
curves for middle old age and extreme old age. 
For ages below 70 these curves do not play so 
important a role, and the method ought therefore 
in these ages yield satisfactory results. We now 
make the assertion that the deaths among the 
survivors in the final life table can be grouped in 
the following typical groups. 

Causes of Death typical of : — 
Group I Extreme Old Age. 
II Middle Old Age. 

— Ill Early Old Age. 

— IV Middle Life. 

V Early Middle Life. 

— VI Pulmonary Tuberculosis, Etc. 

— Vila Early Life Occupational Hazard. 

— Vllb Middle Life Occupational Hazard. 

— Villa Childhood. 

The classification of causes of death according 
to this scheme is given in the following table, mar- 
ked Table A. 
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Table A. Michigan Males 1909—1915 

Classification of causes of death according to the 

chosen system of curves. 



No. in Inter- 
national Class 
fication. 

81. 


i_ GROUP I 
Diseases of the arteries. 


124. 


Diseases of the bladder. 


125—133. 


Other diseases of the genito-urinary 


142. 
154. 
126. 


system. 
Gangrene. 
Old age. 
Diseases of the prostate. 




GROUP II 


10. 


Influenza. 


47—48. 


Rheumatism. 


64. 
65. 
66. 
79. 


Apoplexy. 

Softening of the brain. 

Paralysis. 

Heart disease. 


82. 


Embolism. 


89. 


Acute bronchitis. 


90. 


Chronic bronchitis. 


91. 
94. 


Broncho-pneumonia . 
Congestion of the lungs. 


96—97. 


Asthma and emphysema. 


103. 


Other diseases of the stomach. 
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No. in Inter- 
national Classi- 
fication. 

105. Diarrhea and enteritis, (over 2 years) 
14. Dysentery. 

GROUP III 

39. Cancer of the mouth. 

40. Cancer of the stomach and liver. 

41. Cancer of the intestines. 

44. Cancer of the skin. 

45. Cancer af other organs. 

46. Tumors. 
50. Diabetes. 

53 — 54. Leukemia and anemia. 

63. Other diseases of the spinal cord. 
68. Other forms of mental diseases. 
80. Angina pectoris. 
109 — 110. Hernia, intestinal obstruction, and 
other diseases of the intestines. 

120. Bright's disease. 

121. Other diseases of the kidneys 
123. Calculi of urinary passages. 

GROUP IV 
56. Alcoholism. 
18. Erysipelas. 
62. Locomotor ataxia. 
73 — 76. Other diseases of the nervous system, 
77. Pericarditis. 
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No. in Inter- 
national Class 
fication. 


i- 




78. 


Endocarditis. 




83. 


Diseases of the veins. 




84. 


Diseases of the lymphatics. 


85 


—86. 


Other diseases of the circulatory sy- 
stem. 




87. 


Diseases of the larynx. 




88. 


Diseases of the thyroid body. 




92. 


Pneumonia. 




93. 


Pleurisy. 




95. 


Gangrene of the lungs. 




98. 


Other diseases of the respiratory sy- 
stem. 


99- 


-101. 


Diseases of the mouth, pharynx, and 
oesophagus. 




111. 


Acute yellow atrophy of the liver. 




113. 


Cirrhosis of the liver. 




114. 


Biliary calculi. 


115- 


-116. 


Diseases of the liver and spleen. 




118. 


Other diseases of the digestive system. 


143- 


-145. 


Furuncle, abscess, and other diseases 
of the skin. 


147- 


-149. 


Diseases of the joints, and locomotor 
system. 

GROUP V 




4. 


Malarial fever. 




13. 


Cholera nostras. 
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No. in Inter- 
national Classi- 
fication. 




20. 


Septicemia. 


24. 


Tetanus. 


32. 


Pott's disease. 


33. 


White swellings. 


34. 


Tuberculosis of other organs. 


35. 


Disseminated tuberculosis. 


55. 


Other general diseases. 


60. 


Encephalitis. 


70—71. 


Convulsions. 


102. 


Ulcer of the stomach. 


117. 


Peritonitis. 


119. 


Acute Nephritis. 


164. 


Diseases of the bones. 


155. 


Suicide by poison. 


156. 


Suicide by asphyxia. 


157. 


Suicide by hanging. 


158. 


Suicide by drowning. 


159. 


Suicide by firearms. 


160. 


Suicide by cutting instruments. 


161. 


Suicide by jumping from hight places 


163. 


Suicide by other or unspecified means 


164—165. 


Accidental poisonings. 


166. 


Conflagration. 


167. 


Burns (conflagration excepted). 


168. 


Inhalation of noxious gases. 


172. 


Traumatism by fall. 
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No. in Inter- 
national Classi- 
fication. 

175 — (2). Traumatism by electric railway. 

175 — (3). Traumatism by automobiles. 

175 — (4). Traumatism by other vehicles. 

176. Traumatism by animals. 

178. Cold and freezing. 

179. Effects of heat. 

185. Fractures and dislocations (cause not 

specified. 

GROUP VI 

28. Tuberculosis of the lungs. 

29. Miliary tuberculosis. 
37 — 38. Venereal diseases. 

186. Other accidental traumatism. 
57 — 59. Chronic poisoning. 

67. General paralysis of the insane. 
31. Abdominal tuberculosis. 

GROUP VII 

1. Typhoid fever. 

69. Epilepsy. 

108. Appendicitis. 

182. Homicide. 

169. Accidental drowning. 

170. Traumatism by firearms. 

171. Traumatism by cutting instruments. 
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No. iD Inter- 
national Classi- 
fication. 

173. Traumatism by mines and quarries. 

174. Traumatism by machinery. 
175 — (1). Traumatism by railroads. 

180. Ligthning. 
61. Meningitis. 

GROUP VIII 

5. Smallpox. 

6. Measles. 

7. Scarlet fever. 

8. Whooping cough. 

9. Diphtheria and croup. 
30. Tubercular meningitis. 

150. Congenital malformations. 

9. outline of com- ^ e numDer of deaths in the 
put in g scheme various groups according to the 
above classification and ar- 
ranged according to age during the period 1909 — 
1915 is given in the table B on page 140. 

From that table it is a simple matter to com- 
pute the proportionate death ratios of the separate 
groups of causes of death. Such a computation is 
shown in table C on page 141. 

It is readily seen that these death ratios are 
independent of the number exposed to risk. More- 
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over, the number of observations seem to be suffi- 
ciently large to eliminate serious variations due 
to random sampling. This might perhaps not hold 
true for the age intervals 10 to 14 and 15 to 19 
where not alone random sampling is present, but 
a somewhat modified classification seems neces- 
sary. I have, however, not used the observed pro- 
portionate death ratios for the two younger age 
intervals in my computations which only took into 
account the ratios above 20. For this reason I do 
not deem it necessary to go into a closer investiga- 
tion of a re-classification of causes of death for 
these younger age groups. A more serious defect 
which cannot be overcome is presented in the 
ages above 80 where, as mentioned before, a clas- 
sification according to age is absent in the original 
records for the state of Michigan. The fact that 
the highest number of deaths (12,473) occurred 
in ages above 80 makes this defect more serious 
than the omission of a re-classification of causes 
of death below 20. 

So far we have only been concerned with the 
first step in the complete induction according to 
the model of Jevons, namely that of simple observ- 
ation. The next step in the induction is the hypoth- 
esis. We present now the following working 
hypothesis. 

The frequency distribution of deaths according 
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to age of the above groups of causes of death 
among the survivors of an original cohort of 
1,000,000 entrants at age 10 can be represented by 
a system of frequency curves determined by the 
following characteristic parameters: 







Parameters 






Group 


Mean 


Dispersion 


Skewness 


Excess 


I 


79.6 years 


9.5730 years 


+ .1066 


+ .0546 


II 


70.5 - 


12.8000 - 


+ .0967 


+ .0126 


III 


65.5 - 


13.6870 - 


+ .1248 


+ .0650 


IV 


59.5 - 


17.0890 - 


+ .1790 


- .0106 


V 


65.5 - 


19.9411 - 


+ .0555 


- .0367 


VI 


44.5 - 


16.0352 - 


- .0124 


- .0272 


Vllb 


57.5 - 


12.1552 - 


+ .0008 


- .0005 


Vila 


Poisson-Charlier Curve: Modulus = 28.5 years, 




Eccentricity = 


1.0001 






Villa 


Poisson-Charlier Curve: Modulus 


= 13.5 ye; 


irs. 



From these parameters and from well-known 
tables of the probability or normal frequency curve 
and its various derivatives it is easy to determine 
the frequency distribution for any desired interval. 

For this system of frequency curves we now 
shall try to find the various areas of N v iV n , 

iV In , , N YUI so as to conform to 

the observed values of R x in Table C. As a first 
approach to the final values of N , we may by an 
inspection (which of course is improved upon by 
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a long practice in curve fitting) choose the follow- 
ing approximations. 1 

Group Approximate Value of 'N. 



I 


123000 


II 


366000 


III 


183000 


IV 


105000 


V 


75000 


VI 


70000 


Vila & Vllb 


61000 


VIII 


17000 



1000000 

These preliminary numerical values represent 
the first approximations of the areas of the various 
frequency curves. The sequence represented by 

'NjFJz), 'N n F n (x),'N m F m (x),. ■ -'N^F^x^U) 

gives the number of deaths at age x. We notice 
thus that by multiplying the various equations of 
frequency curves for arbitrary age intervals with 



1 These numbers represent as a matter of fact a first 
rough approximation of the areas of the different com- 
ponent curves by means of the method of point contours. 
Hence it is to be expected that the final adjustments 
will be comparatively small. This fact has, however, no 
influence upon the application of the method. 
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their respective 'ATs we can get a first approxima- 
tion of the final death curve. I give on page 144 an 
approximate table arranged in 5 year intervals. 
We might now first compute the various factors 

k n , k„ /c„ which will be common for all 

observation equations. We have, referring to the 
above formulas (llandl2) for the various k's (15). 

_ 1000000 _ 123089 . = 183045 ) 
°~ 365995 ' 1_ 365995' 3__ 365995' 

_ 104888 75030 69996 . 



365995 5 365995 " 365995 
61003 17002 



(15) 



365995 8 365995 

Or 

& = 2,732, ^ = 0,336, &3 = 0,500,& 4 =0,287, 

k b = 0,205, ft, = 0,191, k 7 = 0,167, k 8 = 0,046. 

To illustrate the further process of the compu- 
tation of the observation equations, let us take a 
certain age interval, say the interval between 
50-54. The value of <1> 2 taken from the above table 
is 163.39. The value of R m (x) for this interval is 
0.234 (see table page 141) . Hence we have the 
following observation equation (16). 

10 
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0.234 = 104.53a 3 : [15.76(^ + 104.530,+ 

84.16a 4 + 64.52a g + 73.55a 6 + 35.01 a ? + 

0.00a 8 + (2.732 — 0.336a 1 — 0.500a 3 — (16) 

0.287 a 4 — 0.205a 6 — 0.191a 6 — 0,167 a g - 

— 0.046 a g ) 163.39]- 

After a few simple reductions this may be 
brought to the following form : 

9.16cl + 99.19a, — 8.72a, — 7.26a. — ) 

13 4 5 (17) 

9.91 a 6 — 1-81 a, + 1.76 a g — 104.45 = 0. j 

In the routine work I usually use a system of 
computing the various equations which is out- 
lined in detail in the accompanying tabular scheme 
referring to all the groups in the age interval 
50-54 and shown on pages 148-154. 

Similar observation equations are arrived at in 
exactly the same manner for other groups and 
other age intervals. For the whole interval from 
age 20 and upwards we get, in this way 96 obser- 
vation equations from which to determine the cor- 
rection factors. The coefficients of theae obser- 
vational equations are then written down, and 
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their various products formed in turn. We deem 
it not necessary to give all these observational 
equations and their coefficients for all the 96 
observations, but shall limit ourselves to give all 
the necessary computations for the interval from 
50-54 as previously considered. With the usual 
system of notation employed in the method of 
least squares we get the scheme on pages 148-154. 

Normal Equations, Michigan Males 1909 — 1915. 

723763 400750 218930 150776 135184 115318 30325 1801152 

877847 253187 176242 149858 129697 34600 2053941 

237159 90440 72317 62110 16246 964843 

105346 47022 39939 10576 628608 

76774 28909 8668 525295 

53378 7012 437390 

2391 111625 

The addition of the various columns of the sum 
products of the coefficients gives us finally the 
above set of normal equations of which we only 
submit the coefficients in the usual scheme em- 
ployed in the method of least squares. 

Solving the above system of normal equations 
by means of the well-known method devised by 
Gauss, we obtain finally the values on page 154 for 
the various a's by which the approximate values 
'N must be multiplied in order to yield the prob- 
able values- of N. 

10* 
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Sum: 6630212.0 107358.0 

Correction Factors, a. 

Group I 1.03284 

II 1.00017 

— Ill 1.03635 
IV 1.03731 

V 1.00956 

VI .0.97334 

— Vila 0.90332 

— Vllb 0.60565 

— VIII 1.13743 
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Applying the above correction factors to the 
respective values of 'IV, we get finally as the total 
areas of the respective component curves : 

Group 



I 


127,131 


II 


366,059 


III 


189,699 


IV 


108,750 


V 


75,747 


VI 


68,130 


Vila 


33,032 


Vllb 


12,133 


VIII 


19,339 



1,000,000 

Multiplying the equations of the various frequency 
curves, F(x), of the percentage distribution in 
each group with the above values of N we ob- 
tain finally the complete mortality table as will 
be given in the Appendix. The final graphical 
representation of the frequency curves is shown 
in Figure 2. 

io. goodness of This completes the third step 

FIT in the inductive process. The 

fourth and final step is the 

verification of the results thus arrived at by a mere 

deductive process. Here it must be remembered 
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that the condition which the final component fre- 
quency curves shall fulfill is the one that observed 
proportionate death ratios shall agree as closely 
as possible with the expected or theoretical pro- 
portionate death ratios as computed from the final 
table. In this connection it must be borne in 
mind that the observed proportionate death ratios 
are given in quinquennial age groups. Thus the 
observed proportionate death ratios in a certain 
age interval, as for example between 50 — 54 are 
really the average or "central' ' proportionate death 
ratios at age 52. From the complete table it is, 
however, possible to compute the proportionate 
death ratios for each specific age. Graphically the 
expected proportionate death ratios will therefore 
represent a continuous curve, while the observed 
ratios will be represented by a rectangular shaped 
column diagram. Such a graphical representation 
is shown in Pig. 3 which simply represents the 
figures in Table C and Table E in graphical form. 
The "goodness of fit" of the "expected" or theore- 
tical values to the ''actual" or observed values is 
seen to be very close, especially in the largest and 
most important groups. It is only in the combined 
groups Vila and Vllb that the "fit" might prob- 
ably be open to criticism for higher ages, but even 
here the deviation is small between the actual and 
theoretical values. A very small increase in the 
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area of the Vllb curve would easily adjust this 
difference. It is, however, doubtful if such a cor- 
rection or adjustment would have any noteworthy 
effect upon the ultimate mortality rates q x , and I 
do not consider it worth while to go to the addi- 
tional trouble of recomputing the areas, especially 
in view of the fact that the observation data above 
the age of 80 are not exact and detailed enough to 
be used in this method of curve fitting. For ages 
up to 70 or 75 I consider, however, the table as 
thus constructed as sufficiently accurate for all 
practical purposes. 

u Massachusetts ^ s an °th er example of the me- 
1914^917 "hod I take the construction 
of a mortality table for the 
State of Massachusetts from the mortuary records 
for the three years 1914, 1915 and 1916. The 
records as given by the Registration reports are 
better than the records for Michigan, in as much 
as they have avoided the deplorable practice of 
grouping all deaths above the age of 80 into a 
single age group. On the other hand, the classifi- 
cations of cause of death in Massachusetts by at- 
tained age are given in ten year age groups only. 
Hence it is readily seen that we will only be able 
to secure half as many observation equations as 
in the case of the five year interval in Michigan. 
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This rather large grouping puts the method to a 
severe test. In spite of this drawback I shall for 
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the benefit of the readers briefly outline the results 
I have obtained from an analysis of the Massachu- 
setts data. 

While for the Michigan data I employed a sy- 
stem of frequency curves previously used with 
success for certain Scandinavian data, I found it 
was easier to fit the Massachusetts data to a sy- 
stem of frequency curves used in the construction 
of a mortality table for England and Wales for 
the years 1911 and 1912 from the mortuary records 
of deaths by age and cause among male lives. The 
classification by age of the causes of death in 8 
groups is also different from that of Michigan, 
especially for middle life and younger ages. The 
parameters of the system of component frequency 
curves to which I fitted the Massachusetts data are 
shown in the following table F : 

Table F. 

Parameters of the System of Frequency Curves 

for Massachusetts Males 1914—1916. 



Group 


Mean 


Dispersion 


Skewness Excess 


I 


78.70 years 


7,9775 years 


+ .0920 + .0331 


II 


68.00 - 


12,2051 - 


+ .1151 + .0234 


III 


63.05 - 


13,0532 - 


+ .1210 + .0471 


IV 


60.45 - 


17,8552 - 


+ .0983 - .0091 


V 


49.60 - 


18,6100 - 


+ .0328 - .0309 


VI 


43.80 - 


14,6750 - 


- .0091 - .0272 


Vllb 


57.40 - 


12,1550 - 


+ .0021 - .0026 


Vila and Villa constructed from Poisson-Charlier Curves. 
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The observed number of deaths according to the 
8 groups of causes of death, and their correspond- 
ing proportionate death ratios are given in the fol- 
lowing tables G and H. 

By finding first approximate values and then by 
a further correction of these approximation areas 
by means of the factors a. determined by the 
method of least squares in exactly the same man- 
ner as demonstrated in the case of Michigan, we 
finally arrive at the following areas of the various 
groups. 



Areas of the component fre 


quency curves in the 


Life Table for Massachusetts Males t 1914 — 1916. 


Areas 




Group I 


90064 


— II 


281470 


— Ill 


207854 


— IV 


151316 


— V 


99543 


— VI 


107718 


Vila & Vllb 


40719 


— Villa 


21316 



1000000 

Forming the products N F (x) for the various 
groups and integral ages we obtain finally the 
life table as shown in the appendix. In order 
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to test the "goodness of fit" of the curves it is 
necessary to compute the expected or theoretical 
proportional death ratios from this latter table and 
compare such ratios with the observed or actual 
proportionate death ratios as shown in Table H. 
The theoretical values are shown in Table I, and 
a graphical representation illustrating the "good- 
ness of fit" between the observed and theoretical 
ratios is given in Fig. 5. I think it will be generally 
admitted that the fit is satisfactory for all practical 
purposes. 

The State of Massachusetts has always been the 
foremost state in the union for reliable and trust- 
worthy statistical records, and in all probability it 
would be possible to secure the deaths by causes in 
5-year age groups instead of ten-year groups. By 
taking the above table as a first approximation one 
should then obtain a very accurate table. On the 
other hand, it is possible to verify the final results 
in the above Life Table for Massachusetts by an 
entirely different process. It happens that the 
State of Massachusetts took a census in April 1915. 
This census for living males by attained ages could 
then be used as an approximation for the exposed 
to risk, while the deaths for the three years could 
be used as a basis for the number of deaths in a 
single year. A Life Table could then be con- 
structed by means of the orthodox methods usually 
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employed by actuaries and statisticians in the con- 
struction of mortality tables from census returns. 



12 ' coMmfvB en. As a third illustration, I shall 
TABLiFitnt—ir cons truct a table for American 
other tables Locomotive Engineers for the 
period 1913—1917. The statistical data forming 
the basic table are the mortuary records by at- 
tained age and cause of death among the members 
of The Locomotive Engineers' Life and Accident 
Insurance Association, a large fraternal order of 
the American Locomotive Engineers. The total 
number of deaths in the five year period amounted 
to more than 4,000. Distributed into separate 
groups of causes of death, it was found that it 
was possible to use a system of frequency curves 
similar to that employed in the State of Massachu- 
setts, except for Group No. IV, for which it was 
found exceedingly difficult to find a single curve 
which would fit the data, and much points towards 
the actual presence of a compound curve of that 
group of causes of death among the Locomotive 
Engineers. The grouping of causes of death is, also 
slightly, different from that of Michigan and Mas- 
sachusetts. I shall not go into further details as 
to the actual construction of this table, except to 
mention the areas of the various component fre- 



Locomotive Engineers. 169 













































y/i 
/I 






















ill ! 




















V 






*s 
















A 














N 






/ 




















*( 






] 




















^ig 


/; 










































\ 


> if 




















\ 


1> [ 






















1 


















v 




% 


















\ 


/ 


"3k\ 

ill 


















\ 






|l 


















\ 


* 




I 
re 















































a 






170 Humaii Death Curves. 

quency curves of which I present the following 
table. 

Areas 
Group I 44,857 

— II 342,645 

— Ill 226,022 

— IV 147,420 

V 47,650 

— VI 31,260 

— Vila 79,005 

— Vllb 77,713 

— VIII 3,428 



1,000,000 



It must also be remembered that the radix of 
this table is taken at age 20, instead of at age 10 
as is the case in the preceding tables. The final 
graph is shown on the preceding page. A num- 
ber of diagrams illustrating the "goodness of 
fit" are also attached and need no further com- 
ment. It might, however, be of interest to men- 
tion the fact that the American actuary, Moir, 
has recently constructed a mortality table for 
American Locomotive Engineers along the ortho- 
dox lines from the data contained in the Medico- 
Actuarial Mortality investigation. Moir's table -- 
or at least the great bulk of the material from 
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which it was derived — falls in the interval be- 
tween 1900 and 1913. Owing to the energetic 
'safety first" movement which since 1912 has been 
actively pursued by most of the leading American 
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Fig. 7. 



railroads, it is, however, to be expected that the 
period 1913 — 1917 indicates a reduced mortality as 
compared with that of Moir's period. This fact 
is also shown in the diagrams in Fig. 7. 1 On the 
other hand, the almost parallel movements of 
Moir's table with that of the table of the fre- 
quency curve method of 1913 — 1917, seems to 
indicate the soundness of the proposed method. 



1 Curves I, II and V are Locomotive Engineers' Mor- 
tality Tables for various periods. 
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„x,„T~T„„r, T A similar table showing mor- 

12 a. ADDITIONAL ° 

mortality tality conditions among a de- 

TABLES 

cidedly industrial or occupational 
group has been constructed for coal miners in the 
United States. The original data of the deaths by 
ages and specific causes were obtained from the 
records of several fraternal orders and a large indus- 
trial life assurance company and comprised nearly 
1600 deaths. The number of deaths above the age of 
sixty were, however, too few in number to determine 
with any degree of exactitude the area of component 
curves for the older age groups. For ages below 
sixty-five the table should on the other hand give a 
true representation of the mortality among coal 
miners in American collieries during the period under 
consideration 1 ). A particular feature of this table is 
the comparatively low mortality in group VI, which 
contains primarily deaths from tuberculosis. Coal 
miners present in this respect different conditions 
than those usually prevailing in dusty trades where 
the death rate from tuberculosis is unusually high. 
The same feature is also borne out in previous in- 
vestigations on the death rate of coal miners in Eng- 



1 It was not possible to seperate anthracite and bituminous coal miners. 
The data indicate, that anthracite mine workers have a higher accident 
rate than workers in bituminous mines. 
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land, and by the recent investigations by Mr. F. L. 
Hoffman on dusty trades in America. 

In order to have a measure of the mortality pre- 
vailing among industrial workers in America, we 
submit a table derived from a very detailed collection 
of mortuary records by age, sex and cause of death 
as published by the Metropolitan Life Insurance Com- 
pany of New York. A deplorable defect in this splen- 
did collection of data is the grouping together of all 
ages above seventy in a single age group, which 
makes it almost impossible to determine the com- 
ponent curves for higher ages with any degree of 
trustworthiness. 

The defect in the original Metropolitan data for 
older age groups made it neccessary to modify the 
earlier sets or families of curves which were used 
on the Michigan and Massachusetts data and to 
combine several of the subsidiary component curves, 
especially those for the older age groups. Such 
modifications were, however, easily performed by 
means of simple logarithmic transformations. 

I give below my grouping scheme for the Metro- 
politan data designated by the code numbers of the 
international list of causes of death. The actual 
cause of death corresponding to each code number 
is found under paragraph 8 of the present chapter. 
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GROUP I 
10, 39 to 46, 48, 50, 54, 63 b, 64 to 66, 68, 79, 81, 
82, 89 to 91, 94, 96, 97, 103, 105, 109 a, 120, 123, 124, 
126, 127, 142, 154. 

GROUP II 
4, 13, 14, 18, 26, 27,, 32 to 35, 47 (over age 20), 49, 
51 to 53, 55, 60, 62, 70 to 72, 77, 78, 80, 83 to 88, 92, 
95, 98 to 102, 106, 107, 109 b, 110 to 119, 122, 125, 143 
to 145, 148, 149, 155 to 163. 

GROUP III 
28, 29, 31, 37, 38, 56 to 59, 67. 

GROUP IV a AND IV b 
1, 5 to 9, 17, 19, 20 to 25, 30, 61, 63 a, 73 to 76, 108, 
146, 147, 150, 164 to 186, 47 (under age 20). 

It will be noted that under this scheme Group I 
includes practically Groups I to III of the Michigan 
classification, Group II corresponds partly to IV and 
V for Michigan, Group III is practically Michigan's 
Group VI, while Group IV a and IV b takes in partly 
V, VII, and VIII in the Michigan experience. As a 
further correction I found it also advisable to transfer 
some of the deaths in the age intervals 10 — 14, 15 — 19, 
20—24, and 25—29 in Groups I and II to Group IV a 
so as to avoid the long left tail ends in these older 
age curves. 
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After grouping the deaths (more than 200,000) of 
the Metropolitan experience according to the above 
scheme, it is a simple matter to compute the various 
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Fig. 9. 

values of R(x) of the four groups for quinquennial 
age intervals and use these values (altogether 52 in 
number) for finding the observation equations and in 
the subsequent determination of the component curves 
as shown in the final mortality table in the appendix 
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to this chapter. A comparison between the observed 
values of R(x) by quinquennial ages and the con- 
tinuous values of R(x) (indicated by dotted curves) 
as computed from the final mortality table is shown 
in Fig. 9. The "fit" between calculated and observed 
values is evidently satisfactory. 

A most instructive and unique experience is of- 
fered in the table of Japanese Assured Males for the 
four year period 1914-1917 and based upon the death 
records of more than a dozen of the leading Japanese 
Life Assurance Companies. About 35,000 deaths by 
cause and arranged in quinquennial age groups were 
available for this construction. The component curves 
for the older age groups were determined by a simple 
logarithmic transformation of the variates and offered 
no particular obstacles in the a priori determination 
of the parameters. The curves for middle and younger 
life were more difficult to handle, especially the 
curves typical of tuberculosis, spinal meningitis and 
the peculiar Oriental disease known as Kakke, aris- 
ing from an excessive rice diet. A first attempt to 
use the same curve types as employed in some of the 
European and American data did result in a very 
poor fit between the observed and calculated values 
of R(x) for the younger age intervals clearly indica- 
ting that the clustering tendencies were different in 
the case of the Japanese data than in the other experi- 
ences I had previously dealt with. 

12 
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The peculiar form of the observed values of R(x) 
for the tuberculosis group indicated beyond doubt 
that the frequency curve for this group itself was a 
compound curve. I therefore decided to include both 
spinal meningitis and kakke with the tuberculosis 
group, and treat this new group as a compound fre- 
quency curve with two components. By successive 
trials I finally succeeded in establishing a complete 
curve system which satisfied the ultimate require- 
ment of the fit between the observed and calculated 
values of R{x) for the various groups. 1 

Grouping of Causes of Death in Japanese Assured 
Males 1914—1917. 
GROUP I 
Diseases of Arteries, Senility, Influenza, Cerebral 
Hemorrhage, Acute and Chronic Bronchitis, Broncho- 
pneumonia. 

GROUP II 
Asthma and Pulmonary Emphysema, Cancer (all 
forms), Tumor, Diabetes, Other Diseases of Body, 
Paralytic Dementia, Tabes Dorsalis, Diseases of other 
organs for circulation of Blood, Chronic Nephritis, 
Other Diseases of Urinary Organs. 

GROUP III 

Mental Diseases, Other diseases of Spine and 
Medulla Oblongata, Other Diseases of Nervous 



1 See Addenda for the final table. 
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System, Diseases of Cardiac Valves, Pneumonia, 
Pleurisy, Other Respiratory Diseases, Gastric Catarrh, 
Ulcer of Stomach, Hernia, Other Diseases of Stomach, 
Diseases of Liver, Acute Nephritis, Diseases of Skin 
and Diseases of Motor Organs. 

GROUP IV a AND IV b 

Typhoid Fever, Malaria, Cholera, Acute Infectious 
Diseases, Peritonitis, Suicide, Dysentery, Tuberculosis 
(all forms), Syphilis, Kakke, Menengitis, Inflamma- 
tion of the Caesum, Death by external causes (acci- 
dents, etc.). 

Arranging the collected Japanese statistics on 
causes of death among assured males by attained 
age at death in accordance with the above scheme 
of grouping, using a 5 year interval as the unit, we 
obtain the following double entry table for the 35207 
deaths as used in my computation for the various 
values ofR(x). 

Ages Group I Group II Group III Group IV Total 



10—14 


3 


4 


37 


79 


123 


15—19 


17 


23 


216 


714 


970 


20—24 


37 


65 


181 


1640 


1923 


25—29 


62 


109 


324 


1975 


2470 


30—34 


124 


257 


800 


1993 


3174 


35—39 


278 


480 


1147 


2065 


3970 


40—44 


449 


662 


1299 


1674 


4084 


45—49 


701 


957 


1352 


1482 


4491 


50—54 


742 


959 


1115 


990 


3806 



12* 
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Ages 


Group I 


Group II 


Group III 


Group IV 


Total 


55—59 


864 


1045 


1041 


728 


3678 


60—64 


865 


847 


874 


482 


3068 


65—69 


626 


571 


612 


186 


1995 


70—74 


399 


268 


347 


80 


1094 


75—79 


123 


76 


100 


20 


319 


80—84 


16 


13 


10 


3 


42 



The observed values of R(x) as derived from the 
above table are shown in the staircase shaped histo- 
graph in Fig. 10. The correlated values of R(x) as 
calculated from the final mortality table are shown 
as dotted curves on the same diagram. The "fit" 
between observed and calculated values of R(x) is 
evidently satisfactory except for the youngest age 
intervals. 

The construction of the present Japanese table con- 
stitutes probably the most severe trial to which the 
proposed method has hitherto been put. We are here 
dealing with an entirely different race living under 
different economic conditions than the nations of 
Europe and America and afflicted with certain forms 
of diseases which are comparatively rare or unknown 
among the Western nations. 

It is therefore gratifying to note that the eminent 
Japanese actuary, Mr. T. Yano, in comparing the 
above mentioned table with an investigation he made 
on the aggregate mortality in 1913-1917 of all the 
Japanese life assurance companies (about 45 in num- 
ber) from the actual number of lives exposed to risk 



Japanese Life Table. 



181 



So 



6s. 



Zo 



V^ 






















s - 










% 




^ 




V 


GffOuP X 






\ 




> 


> 


X 




\ 




■"■ 










» 




i 






. 




\ 




\ 










\ 
\ 


\ 


GrOoof 


li *v 


'n. 






















t 




\ 










\ 




\ 


V 


\ 


















N 


X 




V 






N 


N 










\ 
\ 


\ 


Gtvow 




"» 


^ 




\ 


III 












Qooupt) 












\ 
\ 


\ 


TVA$AV5 










\ 








\ 


\ 














•n. 


^ 























2Q 3o •^o 5o bo Jo Ayat Peo^Vi. 

Fig. 10. 



182 Human Death Curves. 

at various ages has been able to test independently 
the validity of the proposed method to complete 
satisfaction. (See remarks in preface). 

13. criticisms and With these remarks I shall 
summary close the mere technical dis- 
cussion of the proposed method 
and turn my attention to the arguments advanced 
by certain American critics against the possibility 
of constructing mortality tables from records of 
death alone. I deem no apology necessary to meet 
those critics and give a brief historical sketch of 
the origin of the proposed method, because re- 
marks along this line will tend to accentuate the 
difficulties the mathematically trained biometrician 
has to contend with in obtaining a hearing among 
the present day school of actuaries and stati- 
sticians. 

A good many critics, among whom I may men- 
tion Mr. John S. Thompson and Mr. J. P. Little, 
apparently have received an erroneous impression 
of the fundamental processes of the proposed me- 
thod and its evident departure from the conven- 
tional methods. Mr. Thompson states "If we un- 
derstand the process, the result is simply a gradua- 
tion of "d " the "actual" deaths, and it is not 
apparent why a mortality table should not be 
formed from the unadjusted deaths and some other 
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function of graduation with equally good re- 
sults" 1 . From this it would appear that Mr. 
Thompson is of the opinion that I have graduated 
the deaths as actually observed. As any one who 
will take the trouble to read the above article can 
see this is not the case. The actually observed 
numbers of deaths have only been used to con- 
struct the observed proportionate death ratios 2 . 

The whole process may be summarized -as fol- 
lows : 

1) The choice (a priori) of a system of fre- 
quency curves based upon the hypothesis that the 
distribution of deaths according to age from typi- 
cal causes of death can be made to conform to 
those postulated frequency curves whose para- 
meters are known or chosen beforehand. 

2) The grouping of causes of death so as to 
conform with the above mentioned system of fre- 
quency curves. 

3) The computation for each age or age group 
of the proportionate death ratios of such groups 



1 Proceedings of the Casualty Actuarial Statistical 
Society of America, Vol. IV, Pages 399—400. 

2 These objections by Thompson and Little are shown 
in their full obscurity in the case of the tables for Lo- 
comotive Engineers, Coal Miners and Japanese Assured 
Males where the greatest number of observed deaths fell 
between ages 35 — 49. 
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from the oollected statistical data of deaths by age 
and by cause of death. 

4) The choice of approximate values of the 
areas of the various component frequency curves. 
Such approximate values can be determined by 
inspection or by simple linear correlation methods. 

5) The determination by means of the theory 
of least squares of the various correction factors a 
with which the approximate values of the areas 
must be multiplied in order that we may obtain 
the probable values of the areas of the component 
curves. The observation equations necessary for 
this computation are obtained from the observed 
proportionate death ratios, which are indepen- 
dent of the exposed to risk. 

6) The subsequent calculation of the products 
NF(x) for all groups and for all integral ages. 
This gives us again the total number dying from 
all causes at integral ages among the original 
cohort of 1,000,000 entrants at age 10. In other 
words the d x column from which the final morta- 
lity table can be constructed. 

7) The computation of the "expected" or 
theoretical proportionate death ratios from the 
final table and their subsequent comparison with 
the "actual" or observed proportionate death ra- 
tios to illustrate the "goodness of fit". 

It is this last step which constitutes the verifica- 
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tion of the results derived by means of a purely 
deductive or mathematical process, and is a test 
of very stringent requirements. It is namely re- 
quired that there must be a simultaneous "fit", 
not alone for all groups of causes of death, but 
for all age intervals as well. 

The sole justification of the proposed method 
hinges indeed upon the validity of the hypothesis. 
Is it indeed possible to choose a priori a system 
of frequency curves to which to fit our observed 
data? Theoretically speaking each population or 
sample population, as for instance certain occupa- 
tional groups such as locomotive engineers, far- 
mers, textile workers, miners, etc. will in all pro- 
bability have its own particular system of fre- 
quency curves. From a purely practical point of 
view — and this is the one in which we are chiefly 
interested — we may, however, easily get along 
with a limited system af frequency curves for the 
various groups of causes of death and limit our- 
selves to a comparatively few sets of frequency 
curves to which to fit our statistical data. The 
case is analogous to that confronting a manufac- 
turer of shoes. Undoubtedly the foot of one indi- 
vidual is different in form from that of any other 
individual, and in order to get an absolutely fault- 
lessly fitting boot we would all have to go to a 
custom boot maker. Practical experience shows, 
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however, that it is possible to manufacture a few 
sizes of boots, say 6's, 7's, 8's and intermediate 
sizes in quarters and half s, so as to fit to com- 
plete satisfaction the footwear of millions of 
people. Exactly in the same manner I have found 
from a long and varied experience in practical 
curve fitting that it is possible to fit the mortuary 
records of male deaths by attained age and cause 
of death to a comparatively limited number of sets 
of component curves, say not more than 5 or 6 
sets. Moreover, if in a certain sample population 
a certain curve should not exhibit a satisfactory 
fit it is indeed a simple matter to change its para- 
meters so as to improve the fit. 

14 additional ^- n re g ar d *° * ne classification 

PIUNCIPLES OF 0f the CaUSeS 0f death int0 a 

method limited number of groups it 

seems that some of the critics of the method are 
of the opinion that this classification is ironclad 
and fixed. This, however, is not the case. While 
in a specific sample population a certain cause of 
death might fall in group II, it is quite likely 
that the same cause of death would come under 
another group in another sample population. For 
instance, the deaths from asthma are in Michigan 
grouped under Group II. In the case of Coal 
Miners such deaths would, however, go into group 
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IV or group V. If the classification of causes of 
death were fixed, the frequency curves for separate 
population would show great variations, and it 
would be out of the question to limit ourselves to 
a small set of systems of component curves. Mak- 
ing the classification flexible, we are, on the other 
hand, in a better position to proceed with a fewer 
number of curves. For instance, in order to use 
the postulated frequency curve for Group VI for 
Michigan it was necessary to place the cause of 
death listed as No. 186 (other accidental trau- 
matism) of the International Classification of 
Causes of Death in that group instead of in group 

V or VII, where most deaths of this type are or- 
dinarily classed. 

It would be interesting to see to what extent 
the proposed classification and the chosen system 
of frequency curves in Michigan deviates from 
the theoretically exact system of frequency curves. 
In the case of Michigan it would be impossible to 
test this. An approximate test might be obtained 
from the Michigan mortality data for the three 
year period 1909 — 1911. Professor Glover has con- 
structed a mortality table for males in the State 
of Michigan in this three-year period by means 
of the usual methods employed by actuaries by 
resorting to the exposed to risk. Starting with a 
radix af 1,000,000 at age 10 it is possible to break 
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up the deaths or the d x column of the Glover 
table into a set of subsidiary columns of death 
from groups of causes of death in the same order 
as given in Table A on page 133 by means of a 
simple application of the observed proportionate 
mortality ratios as derived from the 1909 — 1911 
period. On the basis of a radix of 1,000,000 sur- 
vivors at age 10 we find that according to the 
Glover Table, 5016 will die in the interval from 
50 — 54. Let us also suppose that the proportionate 
mortality ratios in group III for ages 50 — 54 
amounted to 0.23, then the number of deaths from 
group III in that particular interval in the Glover 
table would be 5016 x 0.23 = 1154. Similar num- 
bers could be found for the other groups and for 
arbitrary age intervals, and we would in this man- 
ner have an empirical representation of the fre- 
quency curves. This aspect of the matter is treated 
in brief form on another page. 

Keturning now to our original discussion, it will 
readily be admitted that the method of construc- 
ting mortality tables by means of compound fre- 
quency curves cannot be considered as absolutely 
rigorous from the standpoint of pure mathematics. 
But neither can the usual methods of constructing 
mortality tables by graduation processes either by 
analytical formulas, mechanical interpolation for- 
mulas or a simple graphical process be considered 
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as mathematically exact. All statistical methods 
are, in fact, approximation processes. In the 
greater part of the realm of applied mathematics 
we have to resort to such approximation processes. 
It is thus absolutely impossible to solve correctly 
by ordinary algebraic processes simple equations 
of higher degree than the fourth. We encounter, 
however, in every day practice innumerable in- 
stances in which an approximation process, as for 
instance Newton's or Horner's methods or the 
method of finite differences, is sufficiently close to 
determine the roots of any equation so as to satisfy 
all practical requirements. 

From this point of view I claim that the pro- 
posed method in the hands of adequately trained 
statisticians will yield satisfactory results, and I 
am inclined to think that the results are probably 
as true as the ones obtained by means of the usual 
methods, which especially in the case of gradua- 
tion by interpolation formulas often are affected 
with serious systematic errors. Moreover, there 
are sound philosophical and biological principles 
underlying the proposed method, which is perhaps 
more than can be said about the usual methods, 
purely empirical in scope and principle. On the 
other hand, I will readily admit that the proposed 
method is by no means a simple rule of the thumb 
and it can under no circumstances be entrusted to 
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the hands of amateurs. The whole process can in 
my opinion only be employed when placed in the 
hands of the adequately trained statistician who is 
thoroughly familiar with his mathematical tools, 
as provided in the formulas from the probability 
calculus. Such adequate training is not acquired 
over night, but only through a long and patient 
study. Meticulous and patient work is often re- 
quired before one is finally brought upon the right 
track, especially in the classification of the causes 
of death. Failure upon failure is oftentimes en- 
countered by the beginner in this work, and it is 
probably only through such failures that the in- 
vestigator is enabled to avoid the pitfalls of the 
often treacherous facts as disclosed by statistical 
data and steer a clear course. Mathematical skill 
is only acquired through a long and careful study. 
The illustrious saying of the Greek geometer, 
Euclid, who once told the Ptolemaian emperor 
that "there is no royal road in mathematics" holds 
true to-day as it did in the days of antiquity. 

The fact that the method is no simple mechani- 
cal rule, but one which can be entrusted into skill- 
ful hands only, is, moreover, in my opinion, one 
of its strong points, because it eliminates all at- 
tempts of dilletantes to make use of it. A large 
manufacturing plant would not, for instance, put 
an ordinary blacksmith or horseshoer to work on 
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making the fine tools for certain parts of automa- 
tic machinery employed in the manufacture of 
staple articles. Only the most skilled and highly 
trained tool makers are able to produce machine 
parts, which often require precision measurements 
running into one thousandth part of an inch. Nor 
would a large contracting firm dream of putting 
a backwoods carpenter in charge of the construc- 
tion of a skyscraper. Yet, this case is absolutely 
analogous to that of letting the mere collector of 
crude statistical data make an analysis and draw 
conclusions from certain collected facts as ex- 
pressed in statistical series of various sorts. 

While some American critics to all appearances 
have misunderstood the principles underlying the 
method, several European reviewers of the short 
summary of the method as originally published in 
the "Proceedings of the Casualty Actuarial and 
Statistical Society of America" evidently have un- 
derstood its fundamental principles completely. 
The European critics seem, however, to be of the 
opinion that there is a rather prohibitive amount 
of arithmetical work involved in the actual con- 
struction of the mortality table. Thus a review in 
the Journal of the Royal Statistical Society for 
May 1918 has this to say : 

"Mr. Fisher's object is to construct a life 
table, being given only the deaths at ages and 
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not the population at risk. The hypothesis 
employed is that the total frequency of deaths 
can be resolved into specific groups of deaths, 
the frequencies of which cluster around cer- 
tain ages. The parameters of these sub-fre- 
quencies having been determined, the areas 
are deduced from a system of frequency cur- 
ves of the form : 

R (x) = N * F *& 

■ BK ' ~ N B F B {x) + N c F c {x) + N D F D (x). . . 

where Rb(x) , the proportional mortality at 
age x of deaths due to causes in group B and 
F B (x), is obtained from the equation of the 
sub-frequency curve for cause B , while Nb + 
N c + N D + . + N E = 1,000,000. The 
values of R(x) provide a system of observa- 
tional equations from which (by least squares) 
the values of N B , &c., can be obtained. 

"Since particularly in industrial statistics, 
or in general statistical inquiries under war 
conditions it is easier to obtain accurate data 
of deaths at ages than of exposed to risk, the 
success of the method is encouraging. It is, 
however, to be noted that the amount of arith- 
metical work envolved is considerable. Quite 
apart from the determination of the para- 
meters of the frequency curves, the formation 
and solution of the normal equations needed 
to compute the areas is a heavy piece of work. 
It would be of interest to see whether the re- 
solution into but three components effected by 
Professor Karl Pearson in his well-known 
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essay published in the "Chances of Death" 
could be made to describe with sufficient ac- 
curacy an ordinary tabulation of deaths from 
age 10 onwards to lead to approximately cor- 
rect results for life table purposes. The test 
should, of course, be made with mortality 
data derived from a population very far from 
being stationary and the deductions compared 
with the results of standard methods. The 
subject is one of peculiar interest at the pre- 
sent time." 

From the above quotation it is evident that this 
English reviewer has a clear conception of the 
fundamental principles upon which the method is 
based. His criticism is mainly directed against 
the heavy piece of arithmetical work involved. 
This work can, however, not be compared with 
the much more difficult task of obtaining the ex- 
posed to risk at various ages, which under all cir- 
cumstances would take much greater time and be 
infinitely more costly, in fact be absolutely pro- 
hibitive from a financial point of view. I wish in 
this connection to state that the whole arithmeti- 
cal work involved in the construction of the Michi- 
gan table was done by two computers in less than 
70 hours, while the corresponding table for Mas- 
sachusetts took about 75 hours. I do not know if 
this can be called exactly prohibitive. 

In regard to the remarks of my British critic 

13 
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concerning the Pearsonian method I might add 
that in my first attempt of an analysis of mortality 
conditions along the lines as described above I 
tried to subdivide the causes of death into four 
groups. It was, however, found that this was not 
always sufficient to describe the frequency dis- 
tribution of the number of deaths around certain 
ages. 1 doubt whether it is at all possible to des- 
cribe the frequency distribution in the various sub- 
groups by a system of normal curves, which, of 
course, would somewhat lessen the work. I have 
made attempts to do this, but so far I have not 
been successful except in a few cases. 1 It might 
be possible that we should succeed in this if we 
first set up a hypothetically determined curve of 
the numbers exposed to risk. Such a curve might, 
for instance, be a normal curve. Personally, I be- 
lieve that little would be gained by such a proce- 
dure. More fruitful appears an analysis by means 
of correlation surfaces. The mortality table con- 
structed by the process as I have described it con- 
stitutes in its final form a correlation surface, 
wherein the age at death and the group of causes 
of death are the independent variables, and the 
number of deaths at a certain age and from a 



1 See Addenda for the Metropolitan Table and the 
Japanese Table. 
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certain group af causes of death is the numerical 
value of the correlation function of the two va- 
riates. Provided one could obtain an exact equa- 
tion of such a correlation surface, it would be a 
simple matter to construct a mortality table, and 
I hope that some statistician may in the future be 
induced to attempt a solution of the problem in 
this lieht. 



15. another ap- Before closing the discussion of 

PLICATION OF & 

thefreqven- this subject we shall, however, 

CY CURVE ME- J . 

thod give a brief description of an- 

other application of compound frequency curves in 
the construction of mortality tables. We have here 
reference to the use of skew frequency curves in 
the graduation of crude mortality rates as com- 
puted in the usual empirical manner as the ratio 
of deaths to the number of lives exposed to risk 
at various ages. On page 165 it was mentioned 
that the State of Massachusetts took a census in 
April 1915. This census together with the deaths 
for the triennial period from 1914 — 1916 makes 
it an easy matter to construct a mortality table in 
the conventional manner. Moreover, such a table 
can be compared with the previously constructed 
table from mortuary records by sex , age and cause 
of death only and shown in the appendix. 

In this connection it might be worth mention- 

13* 
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ing that my first table for Massachusetts as con- 
structed by compound frequency curves was pre- 
pared during the summer of 1918 and first pre- 




sented in a series of lectures delivered at the 
University of Michigan during the month of 
March 1919, while the final official report of the 
1915 Massachusetts census did not come in the 
hands of the present writer before May 1919. 
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The official census of the population of Mas- 
sachuetts by sex and single ages is given on page 
478 in Vol. Ill of the Massachusetts report from 
which Fig. 11 has been constructed. It is seen 
from a mere glance of this graph that there is an 
unduly high tendency among the figures to cluster 
around ages being multiples of 5. This tendency 
is especially marked in the age interval 30 — 60 
and presents a defect which is of no small im- 
portance in the construction of a mortality table 
by means of the conventional methods. It is in- 
deed doubtful if a table constructed from data 
so greatly influenced by observation errors and 
misstatements of ages can be considered as ab- 
solutely trustworthy. On the other hand the data 
ought to be sufficiently exact to test the results 
arrived at by the proposed method of compound 
frequency curves. 

We give below the male population in 5 year 
age groups for the middle census year of 1915 
and the corresponding deaths from all causes 
durirg the triennial period 1914 — 1916. 

MASSACHUSETTS 

1915 Male Population and Number of Deaths 

among Males from 1914 — 1916. 

Ages Population, L x . Deaths 1914— 10. D x . 

5— 9 169010 1715 

10—14 152419 1004 
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Ages J 
15—19 


J opulation, L % . 1 
154773 


Jeaths iyi4— 
1537 


20—24 


171961 


2353 


25—29 


171017 


2726 


30—34 


149294 


2979 


35—39 


142617 


3535 


40—44 


125462 


4007 


45—49 


107909 


4393 


50—54 


89490 


5026 


55—59 


65133 


5459 


60—64 


49079 


5679 


65—69 


34790 


6027 


70—74 


23638 


5946 


75—79 


13724 


4752 


80—84 


6494 


3166 


85—89 


2479 


1751 


90—94 


530 


540 


95—99 


124 


133 


100 & over 


12 


23 



A few small discrepancies will be found to exist 
between this table and the table printed on page 
163, giving the observed deaths from various 
causes in ten year age intervals. This arises solely 
from the fact that a number of deaths were re- 
corded where the contributing cause was unknown 
and could, therefore, not be distributed in their 
proper groups. But this defect is of no influence 
in the construction of mortality table by means 
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of the method of compound frequency curves, un- 
less all the causes reported as unknown should 
happen to belong to the same group, which hardly 
can be assumed to be the case. At any rate the 
proportionate death ratios which are the keystone 
in this method of construction are for practical 
purposes left unaltered whether we include or ex- 
clude these few numbers of unknown causes. In 
the usual way of constructing tables from ex- 
posures and number of deaths it is on the other 
hand absolutely essential to include all deaths as 
otherwise the death rate will be underestimated. 
Bearing these facts in mind we therefore refer 
to the above figures of L x and D x for Massachu- 
setts Males from which we without further diffi- 
culty can construct an empirical mortality table, 
either by graphic methods or by simple summa- 
tion or interpolation formulas. There is indeed no 
dearth of such formulas, of which a large number 
have been devised by Milne, Wittstein, Woolhouse, 
Higham, Sprague, Hardy, King, Spencer, Hen- 
derson, Westergaard, Gram, Karup and several 
other investigators. In the following computation 
I have used a formula originally devised by the 
Italian statistician, Novalis, and later on some- 
what modified by the English actuary, King. 
The following schedule shows the actual process 
in detail. 
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MASSACHUSETTS MALES. 

A. Population. 

Graduated Quinquennial Pivotal Values. 

Graduated 



Ages Population L x A L x A 2 L X Age 



Population 



12 29332 
17 30836 
34537 
34369 



22 



5— 9 169010 — 16591 

10—14 152419 + 2354 + 18945 

15—19 154773 + 17188 + 14834 

20—24 171961— 944 — 18132 

25—29 171017 — 21723 — 20779 27 

30—34 149294— 6677 + 15047 32 29739 

35—39 142617 — 17155 — 10478 37 28607 

40—44 125462 — 17553— 398 42 25095 

45—49 107909 — 18419— 866 47 

50—54 89490 — 24357— 5938 

55—59 65133 — 16054+ 8293 

60—64 49079 — 14289+ 1765 

65—69 34790 — 11152+3137 

70—74 23638— 9914+ 1238 

75—79 13724— 8130 + 1884 

80—84 6494— 4015 + 4115 

85—89 2479— 1949 + 2066 87 

90—94 530— 406 + 1543 92 

95—99 124— 112 + 

100—104 12 



52 
57 
62 

67 
72 
77 

82 



294 97 
102 



21587 

17946 

12961 

9802 

6933 

4717 

2731 

1265 

480 

104 

23 

1 



Graduated Population = u x+7 = 0.2 L x+5 — 
0.008A 2 L, +5 
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B. Deaths 1914—1916. 
Graduated Quinquennial Pivotal Values. 

Ages 

5— 9 
10—14 
15—19 
20—24 
25—29 
30—34 
35—39 
40—44 
45—49 
50—54 
55—59 
60—64 
65—69 
70—74 
75—79 
80—84 
85—89 
90—94 
95—99 
100—104 

In this manner we obtain the graduated quin- 
quennial pivotal values of the population and of 
the deaths for ages 12, 17, 22, 27, ... . etc. Then 



No. of 
Deaths D x A * ' 


(\ 2 n* 


Age 


Graduated 
Deaths 


1715— 711 








1004 + 533 + 


1244 


12 


200.8 


1537 + 816 + 


283 


17 


307.4 


2353+ 373 — 


443 


22 


470.6 


2726+ 253 — 


120 


27 


545.2 


2979 + 556 + 


303 


32 


595.8 


3535 + 472 — 


84 


37 


707.0 


4007 + 386 — 


86 


42 


801.4 


4393 + 633 + 


247 


47 


878.6 


5026 + 433 — 


200 


52 


1005.2 


5459+ 220 — 


213 


57 


1091.8 


5679 + 348 + 


128 


62 


1125.8 


6027 — 81 — 


429 


67 


1205.4 


5946 — 1194 — 


1113 


72 


1189.2 


4752 — 1586 — 


392 


77 


950.4 


3166 — 1415 + 


171 


82 


633.2 


1751 — 1211 + 


204 


87 


350.2 


540— 407 + 


804 


92 


108.0 


133— 110 + 


297 


97 


26.6 


23 




102 


4.6 
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by dividing one third of the graduated deaths by 
the population we have the graduated pivotal 
values of the so-called "central death rates", or 
m x for quinquennial ages from age 12 and up. 
From these values of m, we easily find the corre- 
sponding values of q x by means of the formula : 

1*- 2 + m x 
We give below the results of this computation 

Massachusetts Males 1914—1916. 

Age 1000 q x from Novalis' Formula 

12 2.21 

17 3.33 

22 4.64 

27 5.29 

32 6.68 

37 8.25 

42 10.65 

47 13.53 

52 18.67 

57 26.38 

62 38.29 

67 58.12 

72 81.90 

77 109.91 

82 165.02 

87 240.18 

92 325.64 
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The intervening values of q x are without diffi- 
culty derived by interpolation formulas or by a 
graphical process. Once having all the values of 
q x for separate ages from age 10 and up it is a 
simple matter to form tables of l x and d x commen- 
cing with a radix of 1,000,000 at age 10. Without 
going into tedious details we present the following 
values of l x for decimal ages. 

Massachusetts Males 1914—1916. 



kge 


h 


Ages 


1,d x 


10 


1,000,000 


10—19 


27,700 


20 


972,300 


20—29 


47,330 


30 


924,970 


30—39 


66,750 


40 


858,220 


40—49 


98,650 


50 


759,570 


50—59 


153,900 


60 


605,670 


60—69 


233,150 


70 


372,520 


70—79 


237,130 


80 


135,390 


80—89 


124,760 


90 


10,640 


90 & over 


10,640 



100 32 



16. graduation It is to this table that we now 

BY FLUENCY sha11 **& » P™* 88 ° f re " 

curves graduation by means of the 

method of compound frequency curves.. Here we 
have already an empirical representation of the 
total compound curve of death or the d x curve. 
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This compound curve can now by simple and 
straightforward processes be broken up into its 
various component parts as to causes of deaths by 
means of the various observed proportionate mor- 
tality ratios, R x shown in Table H on page 163. 

Let us for the sake of illustration take the age 
interval 40 — 49. According to our empirically con- 
structed table as derived from the Massachusetts 
1915 census we find that the number of deaths 
among the survivors in this age interval amounts 
to 98,650. 

Applying to this number the observed propor- 
tionate death ratios, B , in table H we are able to 
break this number up into its various component 
parts according to the groups of causes of death 
from which the numerical values of R x were de- 
rived. These component parts are as follows : 



Group Nc 


i. of Deaths 


I 


1180 


II 


18050 


III 


17170 


IV 


17170 


V 


14300 


VI 


23970 


VII a & b 


5820 


VIII 


990 


Total : 


98650 
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In the same manner we can break up the com- 
pound curve (the d x curve) in its eight component 
parts for all other age intervals, which finally gives 
us the following table of component groups, 
printed on the preceeding page, and graphically this 
table will represent a series of frequency diagrams 
of the various groups of causes of deaths. It is an 
easy matter to fit such diagrams to a system of 
Laplacean-Charlier or Poisson-Charlier frequency 
curves, which symbolically may be represented as 
follows : 

N^x), N u F u (x). .N^F^x) 

where F(x) is the frequency function of the per- 
centage distribution according to age of the va- 
rious component groups or curves, while N stands 
for the areas of such curves. 

These curve areas are simply the sub-totals of 
the respective groups in the above table. The pa- 
rameters giving the equations of the curves F t (x), 
F n (x), F UI (x), .... are easily computed by the 
methods of moments and are shown in the follow- 
ing table on page 207. 

Once having determined the parameters of the 
various frequency curves it is a simple matter to 
construct the final mortality table which is shown 
in the addenda. 
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Values of Parameters of Component Curves, 
Massachusetts, 1914—1916 Males. 1 

Group Mean Dispersion Skewness Excess 

I 75.0 9.78 +0.080 —0.005 

II 67.5 13.65 +0.117 +0.017 

III 64.0 14.12 +0.124 +0.030 

IV 60.5 16.51 +0.089 —0.006 
V 50.0 18.61 +0.026 —0.034 

VI 43.5 15.57 —0.036 —0.023 

Vllb 57.5 16.33 —0.027 —0.028 

It now remains for us to compare the final values 
of q x which we obtain from the three tables : 
A) The values of q x as computed in the usual 



1 In this grouping I have combined Vila and VIII 
into a single group and roughly fitted this group to a 
truncated Poisson-Charlier curve. This, of course, is not 
exact and introduces evidently errors in the younger 
age interval from 10 — 19. For ages above 20 this curve 
plays no importance and the other curves should for 
the ages above 20 give a satisfactory fit. If absolutely 
exactitude was required for younger ages it would 
indeed offer no difficulties to compute curves Vila and 
VIII separately and thus obtain a much closer fit in 
the youngest age interval. In view of the fact that 
the present calculation is a test case only, it has not 
been thought necessary to go to these refinements. 
This defect will af course also effect to a slight extent 
group VII b. 
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way from the number of lives exposed to risk and 
the corresponding deaths at various ages. 

B ) The values of q x as obtained by a re-gradua- 
tion of the mortality table under A by means of 
compound frequency curves. 

G) The values of q x constructed from mortuary 
records by sex, age and cause of death, but with- 
out knowing the numbers of lives exposed to risk. 

Massachusetts Males. 1914—1916. 
Values of 3000 q by various methods. 



Age 


A 


B 


C 


17 


3.33 


3.15 


3.27 


22 


4.64 


3.99 


4.28 


27 


5.29 


5.04 


5.46 


32 


6.68 


6.72 


7.03 


37 


8.25 


8.63 


8.88 


42 


10.65 


10.83 


11.05 


47 


13.53 


13.86 


14.05 


52 


18.67 


18.83 


19.13 


57 


26.38 


26.88 


27.66 


62 


38.29 


38.79 


40.26 


67 


58.12 


59.04 


56.54 


72 


81.90 


76.50 


77.61 


77 


109.91 


103.69 


107.51 


82 


165.02 


137.97 


148.79 



I think that every unbiased investigator will 
admit that there exists a close agreement be- 
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tween the three series. It is indeed difficult to 
say which one of the three is the most probable. 
We know that on account of the great perturba- 
tions due to misstatements of ages the values 
under A are effected with considerable errors. The 
usual interpolation or summation formulas do not 
suffice to remove these errors and tend often to 
increase them. A re-graduation by means of fre- 
quency curves as shown in series B will in all 
probability give better results, although on ac- 
count of the large age interval (10 years) in which 
the causes of deaths are grouped in the Massa- 
chusetts reports this method does not come to its 
full right 1 . The values of q x under A and B are 
naturally closely related to each other, and those 
in series B cannot be derived unless the values 
in series A are known beforehand. Series C on 
the other hand is independent of either A or B, 
having been derived by means of entirely different 
methods of construction. 



17. comparison A comparison between the pa- 

B §£WJi?£'£l F ~ rameters in the seperate com- 

thods ponent curves in B and C 

gives us, however, a way of testing the validity 

of the hypothesis upon which the method of 



See footnote on page 127. 

14 
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series G rests. In the case of the series G we star- 
ted with the hypothesis of the existence of a set 
of frequency curves of the percentage distribution 
of the number of deaths according to age among 
the various groups. On the basis of this hypothesis 
and from the observed values of the proportionate 
death ratios, R , we determined by the method 
of least squares the areas of this postulated set of 
frequency curves. In the case of the B series we 
broke up the empirically constructed compound 
death curve (the d curve) into its various com- 
ponent parts according to a similar classification 
of causes of deaths as under C. We have therefore 
in this case an empirical determination of the 
areas of the component curves and all that we 
need to do is to graduate the rough frequency 
diagrams as represented by such areas to a system 
of frequency curves. 

Let us now briefly examine how far the various 
skew frequency curves in series B and C differ 
from each other. In regard to the various statis- 
tical parameters of the separate groups we have 
the following results : 





Means. 




Group 


Series G 


Series B 


I 


78.5 


75.0 


II 


68.0 


67.5 
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Group 


Series G 


Series B 


III 


63.0 


64.0 


IV 


60.5 


60.5 


V 


49.5 


50.0 


VI 


44.0 


43.5 


Vllb 


57.5 
Dispersions 


57.5 


Group 


Series C 


Series B 


I 


7.98 


9.78 


II 


12.21 


13.65 


III 


13.05 


14.12 


IV 


17.86 


16.51 


V 


18.51 


18.61 


VI 


14.68 


15.57 


Vllb 


12.16 
Skewness. 


16.33 


Group 


Series C 


Series B 


I 


+ 0.092 


+ 0.080 


II 


+ 0.115 


+ 0.117 


III 


+ 0.121 


+ 0.124 


IV 


+ 0.098 


+ 0.089 


V 


+ 0.033 


+ 0.026 


VI 


—0.010 


—0.036 


Vllb 


—0.002 


—0.027 



14* 
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Excess. 




Group 


Series G 


Series B 


I 


—0.033 


—0.005 


11 


+ 0.023 


+ 0.017 


III 


+ 0.047 


+ 0.030 


IV 


—0.009 


—0.006 


V 


—0.031 


—0.034 


VI 


—0.027 


—0.023 


Vllb 


—0.003 


—0.028 



Taken all in all there is found to exist a satis- 
factory agreement between the hypothetical va- 
lues in series C and the values derived by empiri- 
cal methods. It is only in group I that we find 
some important discrepancies. This group contains 
causes of death typical of extreme old age where 
we naturally may expect great perturbations 
owing to large errors from random sampling, 
especially in series B. In this same connection 
we may also mention that the empirically deter- 
mined values under series B are subject to a slight 
correction by means of the Sheperd formulas, 
which were not employed in my computations. 

We have already mentioned that the system 
of frequency curves which we choose a priori 
for Massachusetts (Series C) was the same system 
which we had used on a previous occasion 
in the construction of a mortality table for Eng- 
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lish Males for the period 1911— 1912 1 ). This is a 
fact of no small importance. It will in general be 
found that the percentage distribution according 
to age in the various component curves differs 
little in different sample populations. Even in the 
case of American Locomotive Engineers it was 
found possible to use the same set of curves as in 
the case of Massachusetts and England and Wales. 
In the same way I have found that the set of 
curves used in the construction of the table of 
Michigan Males also can be used in the case of 
males in the urban population of Denmark. With 
a very few exceptions I have found it possible 
to get along with a limited number of sets of 
curves, say four or five sets. Should it never- 
theless prove impossible to fit the original data to 
any one of these particular curve systems, it will 
in most cases be found possible by means of suc- 
cessive approximations to reach a system of cur- 
ves which may be made the a priori basis for the 
construction of the final table as was the case in 
the table for Japanese assured males. 

Finally we come to the comparison of the vari- 
ous areas of the component curves. We have 
here : 



1 See " Proceedings of the Casualty Actuarial Society 
of America", Vol. IV, page 409. 



214 Human Death Curves. 





Areas. 






G 


B 


I 


90064 


105000 


II 


281470 


296190 


III 


207854 


213010 


IV 


151316 


144200 


V 


99543 


87850 


VI 


107718 


106260 


VII & VIII 


62035 


47410 



Total 1000000 1000000 

Evidently the agreement is not so close in this 
case. But it would indeed be rather rash to assert 
that the values in series G are faulty. One must 
here bear in mind the diametrically opposite 
principles employed in the determination of these 
areas. In series B we have a direct determination 
by empirical methods. In this determination we 
shall, however, find reflected all the original sy- 
stematic and observational errors originally pre- 
sent in series A from which the curves under B 
were computed. Every error due to misstatements 
of ages and systematic errors introduced by the 
summation or interpolation formulas will be di- 
rectly reflected in the areas under series B, and 
such areas can therefore in a sense only be con- 
sidered as a first approximation to the true or 
presumptive areas. 
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Another point well worth remembering is the 
one that no conditions are imposed upon the areas 
in series B. In series G where we work with mor- 
tuary records only we have on the other hand the 
very important condition or restriction requiring 
that the areas of the component curves must be 
so determined that their ratios to the compound 
curve for various age intervals will conform as 
closely as possible with the observed proportionate 
death ratios, R x , for those same age intervals. 

In order to test the influence of this additional 
requirement in respect to conformity to observed 
proportionate death ratios we might use the values 
of the component curves under series B as a first 
approximation and then afterwards determine the 
correction factors a for the areas in exactly the 
same way as in the case of series G. No doubt 
such a calculation would tend to improve the 
table. 

A difficulty occurs, however, in the case of 
the Massachusetts data owing to the large interval 
of 10 years into which the causes of death by 
attained ages are grouped. As pointed out in the 
footnote on page 127 the quantity R B (x), (x = 

30, 11, 12, 100 ; B =1, II, III, ) , 

can only be considered as being independent of 
the "exposed to risk" if the age interval into which 
the deaths fall is sufficiently small. If this is not 
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the case, the "central" values of Rb (%) are 
subject to certain corrections. In the case of the 
groups of causes of death typical of younger ages 
the observed "central" values of Ryii (%) and 
Iivm (x) for the age intervals 10 — 19, 20 — 29, 
30 — 39 are evidently too high, while on the other 
hand the values of Rj (x) and JJ n (z) in the case 
of the age intervals 60—69, 70—79, 80—89, 
90 — 100 are too low as compared with the true 
values of R(x) at these "central" ages. I have, 
however, tacitly ignored this fact in my computa- 
tions. The subsequent result is that the final 
values of q x for the younger ages in column C as 
shown on page 208 are in all probability a little 
too high, and the values of q x above 65 too low. 
In the case of the other tables as shown in the 
present book the age interval into which the causes 
of death were arranged was 5 years or less, and 
the error was thus reduced to such an extent that 
further corrections may be disregarded for all 
practical purposes. 



ADDENDA I 



Showing Detailed Mortality Tables and Death 
Curves for 

1) Japanese Assured Males (1914 — 1917) 

2) Metropolitan Life. White Males (1911—1916) 

3) American Coal Miners (1913—1917) 

4) American Locomotive Engineers (1913 — 1917) 

5) Massachusetts Males (Series C) (1914—1916) 

6) Michigan Males (1909—1915) 

7) Massachusetts Males (Series B) (1914—1916). 
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Mortality Table — Japanese Assured Males 
1914—1917 (Aggregate Table) 



Age 


I 


II 


III 


IVa 


IVb 


dx 


lx 


lOOOqx 


15 


24 


65 


343 




2379 


2811 


1000000 


2.81 


18 


39 


74 


360 




3645 


4118 


997189 


4.13 


17 


43 


84 


388 




4888 


5403 


993071 


5.44 


18 


48 


93 


415 




5981 


6557 


987668 


6.64 


19 


54 


107 


446 




6826 


7433 


981111 


7.58 


20 


60 


120 


478 




7447 


8105 


973678 


8.32 


21 


68 


135 


513 




7716 


8432 


965573 


8.73 


22 


77 


153 


550 


12 


7734 


8526 


957141 


8.91 


23 


87 


171 


591 


27 


7581 


8457 


948615 


8.92 


24 


101 


195 


633 


50 


7274 


8253 


940158 


8.86 


25 


111 


218 


678 


77 


6864 


7948 


931905 


8.53 


26 


126 


246 


729 


112 


6384 


7597 


923957 


8.22 


27 


140 


278 


780 


153 


5860 


7211 


916360 


7.87 


28 


160 


315 


838 


206 


5341 


6860 


909149 


7.54 


29 


178 


353 


899 


26S 


4821 


6519 


902289 


7.22 


30 


198 


305 


963 


341 


4323 


6220 


895770 


6.94 


31 


227 


446 


1033 


425 


3853 


5984 


889550 


6.73 


32 


252 


501 


1109 


521 


3421 


5804 


883566 


6.59 


33 


286 


557 


1185 


629 


3021 


5678 


877762 


6.46 


34 


319 


626 


1273 


751 


2665 


5633 


872084 


6.46 


35 


358 


700 


1364 


885 


2336 


5643 


866451 


6.51 


36 


401 


779 


1460 


1031 


2048 


5719 


860808 


6.64 


37 


450 


872 


1564 


1186 


1797 


5869 


855089 


6.86 


38 


502 


970 


1671 


1350 


1566 


6059 


849220 


7.13 


39 


570 


1081 


1791 


1524 


1366 


6332 


843161 


7.51 


40 


638 


1197 


1916 


1701 


1191 


6643 


836829 


7.94 


41 


716 


1332 


2049 


1883 


1037 


7017 


830186 


8.45 


42 


802 


1475 


2193 


2066 


903 


7439 


823169 


9.04 


43 


899 


1632 


2341 


2249 


783 


7904 


815730 


9.69 


44 


1005 


1799 


2501 


2428 


680 


8413 


807826 


10.41 


45 


1126 


1985 


2671 


2599 


598 


8979 


799413 


11.23 


46 


1261 


2180 


2852 


2764 


514 


9571 


790434 


12.10 


47 


1406 


2393 


3042 


2917 


447 


10205 


780863 


13.07 


48 


1575 


2611 


3236 


3061 


395 


10878 


770658 


14.12 


49 


1764 


2867 


3459 


3187 


339 


11606 


759780 


15.27 


50 


1957 


3122 


3666 


3298 


295 


12338 


748174 


16.49 


51 


2180 


3395 


3892 


3389 


257 


13113 


735836 


17.82 


52 


2426 


3679 


4136 


3473 


224 


13938 


722723 


19.29 


53 


2692 


3984 


4380 


3532 


195 


14783 


708785 


20.86 


54 


2987 


4285 


4638 


3576 


172 


15658 


694002 


22.56 


55 


3306 


4610 


4922 


3611 


147 


16596 


678344 


24.47 


56 


3654 


4940 


5177 


3612 


130 


17513 


661748 


26.46 


57 


4026 


5274 


5456 


3605 


113 


18474 


644235 


28.68 


58 


4432 


5603 


6742 


3581 


97 


19455 


625761 


31.09 


69 


4857 


5937 


6025 


3544 


84 


20447 


606306 


33.72 


60 


5316 


6257 


6316 


3498 


74 


21461 


585859 


36.63 


61 


5795 


6568 


6604 


3424 


69 


22460 


564398 


39.79 


62 


6293 


6860 


6890 


3345 


59 


23447 


541938 


43.27 


63 


6805 


7129 


7162 


3255 


51 


24402 


518491 


47.15 


64 


7332 


7361 


7423 


3150 


43 


25309 


494089 


51.22 


65 


7854 


7570 


7672 


3042 


38 


26176 


468780 


55.84 
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Ige 


I 


II 


III 


IVa 


IVb 


dx 


Ix 


lOOOqx 


66 


8366 


7727 


7896 


2919 


36 


26944 


442604 


60.88 


67 


8863 


7838 


8089 


2791 


31 


27612 


415660 


66.43 


68 


9313 


7894 


8257 


2655 


28 


28147 


888048 


72.53 


69 


9719 


7894 


8385 


2511 


23 


28532 


359901 


79.27 


70 


10053 


7829 


8468 


2362 


20 


28732 


331369 


86.71 


71 


10294 


7700 


8503 


2212 


18 


28727 


302637 


94.92 


72 


10424 


7496 


8477 


2067 


15 


28479 


273910 


103.97 


73 


10424 


7227 


8389 


1901 


13 


27954 


245431 


110.69 


74 


10280 


6897 


8230 


1746 


13 


27166 


217477 


124.91 


75 


9970 


6503 


8002 


1593 


10 


26078 


190311 


137.02 


76 


9492 


6057 


7695 


1444 


10 


24698 


164233 


150.38 


77 


8834 


5571 


7313 


1298 


8 


23024 


139535 


165.00 


78 


8037 


5047 


6853 


1159 


7 


21103 


116511 


181.12 


79 


7086 


4499 


6314 


1026 


6 


18931 


95408 


198.42 


80 


6046 


3943 


5733 


900 


5 


16621 


76477 


217.33 


81 


4953 


3400 


5091 


784 


4 


14232 


59856 


237.77 


82 


3871 


2862 


4421 


676 


3 


11833 


45624 


259.35 


83 


2813 


2365 


3730 


577 


2 


9487 


33791 


280.75 


84 


1957 


1907 


3046 


489 


1 


7400 


24304 


304.48 


85 


1232 


1498 


2396 


412 




5538 


16904 


327.61 


86 


701 


1141 


1797 


340 




3979 


11366 


350.08 


87 


343 


844 


1275 


277 




2739 


7387 


370.76 


S8 


140 


603 


844 


225 




1812 


4648 


389.78 


89 


48 


40S 


516 


179 




1151 


2836 


405.85 


90 


14 


269 


283 


141 




707 


1685 


419.58 


91 


5 


171 


134 


110 




420 


978 


429.44 


92 




111 


53 


83 




247 


558 


442.65 


93 




56 


14 


63 




133 


311 


452.10 


94 




28 


4 


44 




76 


178 


457.05 


95 




14 


2 


31 




47 


102 


460.78 


96 




5 


1 


22 




28 


55 


509.01 


97 








14 




14 


27 


518.50 


98 
99 








4 




9 

4 


13 

4 


692.30 
1000.00 



Mortality Table 
Metropolitan White Males 1911- 



-1916 



Age 


I 


II 


III 


IVb 


IVa 


dx 


lx 


lOOOqx 


10 


SO 


153 


205 


47 


1720 


2205 


1000000 


2.21 


11 


95 


179 


274 


01 


1776 


2385 


997795 


2.39 


12 


118 


210 


350 


77 


1812 


2567 


995410 


2.58 


13 


141 


244 


444 


96 


1832 


2757 


992843 


2.78 


14 


168 


282 


550 


116 


1834 


2950 


990086 


2.98 


15 


202 


327 


671 


140 


1825 


3165 


987136 


3.21 


16 


240 


373 


810 


171 


1803 


3397 


983971 


3.45 


17 


282 


427 


960 


199 


1772 


3640 


980574 


3.71 


18 


336 


483 


1130 


233 


1733 


3915 


976934 


4.01 


19 


393 


545 


1315 


274 


1680 


4207 


973019 


4.32 


20 


454 


611 


1514 


311 


1612 


4502 


968812 


4.65 


21 


527 


685 


1728 


358 


1539 


4837 


964310 


5.02 


22 


599 


765 


1951 


407 


1449 


5169 


959473 


5.39 


23 


6S7 


845 


2184 


459 


1363 


5538 


954304 


5.80 



220 








Addenda. 








Age 


I 


II 


III 


IVb 


IVa 


dx 


iX 


lOOOqx 


24 


775 


932 


2428 


515 


1279 


5929 


948766 


6.25 


25 


874 


1024 


2674 


575 


1190 


6337 


942837 


6.72 


26 


977 


1120 


2924 


638 


1107 


6766 


936500 


7.32 


27 


1088 


1223 


3173 


703 


1012 


7199 


929734 


7.74 


28 


1202 


1328 


3414 


770 


923 


7637 


922535 


8.28 


29 


1324 


1436 


3648 


839 


840 


8087 


914898 


8.84 


30 


1473 


1549 


3879 


909 


757 


8567 


906811 


9.45 


31 


1584 


1662 


4089 


985 


684 


9004 


898244 


10.02 


32 


1702 


1779 


4283 


1052 


614 


9430 


889240 


10.60 


33 


1863 


1899 


4459 


1125 


545 


9891 


879810 


11.24 


34 


2012 


2015 


4604 


1196 


485 


10312 


869919 


11.85 


35 


2160 


2139 


4740 


1266 


427 


10732 


859607 


12.48 


36 


2324 


2259 


4842 


1332 


378 


11135 


848875 


13.12 


37 


2485 


2379 


4919 


1399 


335 


11517 


837740 


13.75 


38 


2664 


2501 


4968 


1462 


296 


11891 


826223 


14.39 


39 


2847 


2617 


4989 


1520 


25S 


12231 


814332 


15.02 


40 


3057 


2734 


4988 


1577 


226 


12578 


802101 


15.68 


41 


3272 


2848 


4953 


1628 


192 


12893 


789523 


16.33 


42 


3508 


2960 


4898 


1675 


163 


13204 


776630 


17.00 


43 


3767 


3066 


4821 


1719 


143 


13516 


763426 


17.70 


44 


4057 


3170 


4719 


1757 


120 


13823 


749910 


18.43 


45 


4389 


3267 


4604 


1789 


100 


14149 


736087 


19.22 


46 


4748 


3358 


4471 


1816 


90 


14483 


721938 


20.06 


47 


5153 


3447 


4320 


1839 


75 


14834 


707455 


20.97 


48 


5599 


3526 


4160 


1855 


61 


15201 


692621 


21.95 


49 


6064 


3598 


3991 


1867 


50 


15590 


677420 


23.01 


50 


6631 


3663 


3810 


1872 


42 


16018 


661830 


24.20 


31 


7198 


3721 


3630 


1872 


35 


16456 


645812 


25.48 


52 


7820 


3769 


3443 


1867 


30 


16929 


629356 


26.90 


53 


8492 


3809 


3254 


1857 


22 


17434 


612427 


28.47 


54 


9168 


3839 


3069 


1840 


10 


17926 


594993 


30.13 


55 


9897 


3858 


2876 


1820 


1 


18452 


577067 


31.98 


56 


10637 


3868 


2696 


1793 




18994 


558615 


34.00 


57 


11378 


3867 


2519 


1762 




19526 


539621 


36.18 


58 


12114 


3853 


2340 


1726 




20033 


520095 


38.52 


59 


12847 


3830 


2169 


1687 




20533 


500062 


41.06 


60 


13555 


3794 


2004 


1640 




20591 


479529 


43.77 


61 


14217 


3746 


1844 


1591 




21396 


358538 


46.67 


62 


14817 


3685 


1692 


1541 




21735 


437140 


49.72 


63 


15359 


3615 


1547 


1484 




22005 


415405 


52.97 


64 


15820 


3535 


1408 


1425 




22188 


393400 


56.40 


65 


16179 


3443 


1277 


1364 




22263 


371212 


59.97 


66 


16450 


3340 


1153 


1299 




22242 


348949 


63.74 


67 


16610 


3229 


1037 


1235 




22111 


326707 


67.68 


68 


16691 


3109 


930 


1166 




21896 


304596 


71.89 


69 


16591 


2981 


828 


1098 




21498 


282700 


76.05 


70 


16412 


2851 


736 


1030 




21029 


261202 


80.51 


71 


16107 


2711 


649 


955 




20422 


240173 


85.03 


72 


15721 


2568 


571 


892 




19752 


219751 


89.88 


73 


15225 


2423 


500 


825 




18973 


199999 


94.87 


74 


14629 


2271 


434 


759 




18093 


181026 


99.95 


75 


13946 


2126 


377 


695 




17144 


162933 


105.22 


76 


13225 


1976 


325 


632 




16158 


145789 


110.83 


77 


12423 


1828 


278 


572 




15101 


129631 


116.49 


78 


11580 


1684 


237 


515 




14016 


114530 


122.38 
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Age 


I 


II 


III 


IVU IVa 


dx 


lx 


lOOOqx 


79 


10729 


1543 


200 


461 


12933 


100514 


128.67 


80 


9840 


1406 


167 


411 


11824 


87581 


135.01 


81 


8950 


1272 


138 


363 


10723 


75757 


141.54 


82 


8092 


1144 


115 


318 


9669 


65034 


148.68 


83 


7237 


1024 


98 


282 


8641 


55365 


156.07 


84 


6420 


911 


79 


247 


7657 


46724 


163.88 


86 


5645 


806 


65 


208 


6724 


39067 


172.11 


86 


4920 


707 


53 


1S1 


5861 


32343 


181.21 


87 


4240 


615 


43 


150 


5048 


26482 


190.62 


88 


3622 


531 


34 


126 


4313 


21434 


201.22 


89 


3065 


457 


27 


106 


3655 


17121 


213.48 


90 


2550 


387 


22 


87 


3046 


13466 


226.20 


91 


2099 


327 


16 


70 


2512 


10420 


241.07 


92 


1698 


270 


14 


56 


2038 


7908 


257.71 


93 


1355 


222 


11 


45 


1633 


5870 


278.19 


94 


1053 


179 


8 


35 


1275 


4237 


300.92 


95 


805 


143 


6 


27 


981 


2962 


331.20 


96 


595 


112 


5 


20 


732 


1981 


369.51 


97 


412 


85 


1 


14 


512 


1249 


409.93 


98 


286 


62 




10 


358 


737 


485.75 


99 


198 


27 




6 


231 


379 


609.50 


100 


95 


15 




4 


114 


148 


770.27 


101 


27 


5 




2 


34 


34 


1000.00 



Mortality Table — American Coal Miners 
(1913—1917) 



Age 


I n 


III 


IV 


Va 


Vb 


VI 


dx 


be 


lOOOqx 


18 


99 


124 


142 


4566 


7 


366 


5304 


1000000 


5.30 


19 


114 


144 


164 


4702 


10 


408 


5542 


994696 


5.57 


20 


140 


168 


187 


4954 


14 


452 


5915 


989154 


5.98 


21 


162 


194 


214 


5196 


19 


498 


6283 


983239 


6.39 


22 


190 


223 


243 


5234 


27 


546 


6463 


976956 


6.62 


23 


223 


250 


272 


5151 


38 


597 


6531 


970493 


6.73 


24 


256 


282 


307 


5067 


50 


646 


6608 


963962 


6.86 


25 


298 


315 


341 


4952 


69 


697 


6672 


957354 


6.97 


26 


341 


349 


379 


4846 


91 


749 


6755 


950682 


7.11 


27 


390 


386 


421 


4748 


120 


802 


6867 


943927 


7.27 


28 


440 


424 


465 


4683 


156 


853 


7021 


937060 


7.49 


29 


498 


461 


508 


4569 


202 


903 


7141 


930039 


7.68 


30 


557 


500 


560 


4413 


257 


953 


7240 


922898 


7.84 


31 


622 


538 


609 


4220 


326 


1002 


7317 


915658 


7.99 


32 


688 


579 


663 


4000 


408 


1048 


7386 


908341 


8.13 


33 


761 


618 


718 


3757 


505 


1093 


7452 


900955 


8.27 


34 


837 


654 


777 


3500 


618 


1133 


7519 


893503 


8.42 


35 


915 


693 


840 


3233 


749 


1175 


7605 


885984 


8.58 


36 


994 


732 


905 


2963 


898 


1212 


7704 


878379 


8.77 


37 


1084 


775 


973 


2697 


1064 


1246 


7839 


870675 


9.00 


38 


1171 


818 


1045 


2435 


1251 


1277 


7997 


862836 


9.27 


39 


1267 


867 


1124 


2184 


1452 


1305 


8199 


854839 


9.59 


40 


1364 


920 


1206 


1946 


1667 


1329 


8432 


846640 


9.96 


41 


1471 


978 


1293 


1723 


1894 


1352 


8711 


838208 


10.39 


42 


1581 


1045 


1386 


1515 


2131 


1369 


9027 


829497 


10.88 
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Age 


I 


II 


III 


IV 


Va 


Vb 


VI 


dx 


lx 


lOOOqx 


43 




1705 


1125 


1489 


1325 


2372 


1383 


9399 


820470 


11.46 


44 




1835 


1222 


1585 


1106 


2609 


1395 


9752 


811071 


12.02 


45 


1 


1976 


1322 


1712 


883 


2841 


1403 


10133 


801319 


12.65 


46 


6 


2132 


1444 


1837 


853 


3063 


1408 


10743 


791186 


13.58 


47 


10 


2302 


1584 


1971 


729 


3265 


1410 


11271 


780443 


14.44 


48 


21 


2492 


1741 


2114 


619 


3443 


1408 


11838 


769172 


15.39 


49 


32 


2705 


1918 


2265 


524 


3595 


1402 


12441 


757334 


16.43 


50 


42 


2934 


2118 


2423 


442 


3706 


1395 


13060 


744893 


17.53 


51 


54 


3190 


2337 


2589 


368 


3790 


1383 


13711 


731833 


18.74 


52 


73 


3470 


2567 


2764 


307 


3832 


1368 


14380 


718122 


20.02 


53 


94 


3775 


2820 


2945 


255 


3832 


1352 


15073 


703742 


21.42 


54 


123 


4104 


3086 


3130 


210 


3790 


1331 


15774 


688669 


22.91 


55 


153 


4437 


3355 


3313 


173 


3706 


1308 


16445 


672895 


24.44 


56 


185 


4843 


3637 


3501 


141 


3595 


1281 


17183 


656450 


26.18 


57 


225 


5246 


3922 


3689 


115 


3443 


1252 


17892 


639267 


27.99 


58 


268 


5656 


4192 


3872 


93 


3265 


1220 


18566 


621375 


29.88 


59 


310 


6085 


4454 


4047 


76 


3063 


1186 


19221 


602809 


31.89 


60 


354 


6530 


4703 


4209 


61 


2841 


1148 


19846 


583588 


34.01 


61 


402 


6970 


4936 


4364 


48 


2609 


1109 


20438 


563742 


36.25 


62 


450 


7403 


5133 


4500 


39 


2372 


1076 


20964 


543304 


38.59 


63 


508 


7832 


5305 


4618 


30 


2131 


1023 


21447 


522340 


41.05 


64 


573 


8230 


5438 


4718 


24 


1894 


978 


21855 


500893 


43.63 


65 


648 


8615 


5533 


4795 


19 


1667 


931 


22208 


479038 


46.36 


66 


746 


8954 


5581 


4846 


15 


1452 


884 


22478 


456830 


49.20 


67 


875 


9255 


5596 


4871 


13 


1251 


834 


22695 


434352 


52.25 


68 


1015 


9507 


5563 


4871 


9 


1064 


785 


22814 


411657 


55.41 


69 


1207 


9704 


5479 


4841 


6 


898 


736 


22871 


388843 


58.81 


70 


1437 


9846 


5358 


4786 


6 


749 


686 


22868 


365972 


62.49 


71 


1702 


9917 


5196 


4701 


4 


618 


637 


22775 


343104 


66.38 


72 


2008 


9931 


4999 


4592 


4 


505 


588 


22627 


320329 


70.64 


73 


2334 


9871 


4771 


4460 


2 


408 


540 


22386 


297702 


75.20 


74 


2677 


9747 


4513 


4302 


2 


326 


494 


22061 


275316 


80.10 


75 


3028 


9557 


4233 


4125 


2 


257 


449 


21651 


253255 


85.49 


76 


3332 


9307 


3941 


3929 


1 


202 


408 


21120 


231604 


91.19 


77 


3610 


9001 


3638 


3722 


1 


156 


366 


20494 


210484 


97.37 


78 


3827 


8643 


3322 


3496 




120 


329 


19737 


189990 


103.88 


79 


3967 


8237 


3012 


3267 




91 


293 


18867 


170253 


110.82 


80 


4020 


7799 


2704 


3029 




69 


258 


17879 


151386 


118.10 


SI 


3980 


7327 


2411 


2788 




50 


226 


16782 


133507 


125.70 


82 


3916 


6803 


2123 


2552 




38 


198 


15630 


116725 


133.90 


83 


3658 


6315 


1846 


2313 




27 


171 


14330 


101095 


141.75 


84 


3370 


5801 


1596 


2085 




19 


147 


13018 


86765 


150.04 


85 


3040 


5286 


1366 


1862 




14 


125 


11693 


73747 


15856 


86 


2684 


4776 


1151 


1650 




10 


105 


10376 


62054 


167.21 


87 


2305 


4281 


957 


1448 




7 


88 


9086 


51678 


175.82 


88 


1937 


3809 


789 


1261 




5 


71 


7872 


42592 


184.82 


89 


1584 


3353 


640 


1085 




3 


60 


6725 


34720 


193.69 


90 


1269 


2924 


513 


927 




2 


48 


5683 


27995 


203,00 


91 


985 


2535 


404 


784 




2 


38 


4748 


22312 


212.80 


92 


747 


2168 


310 


650 




1 


29 


3905 


17564 


222.33 


94 


551 


1845 


231 


531 






22 


3180 


13659 


232.81 


94 


396 


1545 


170 


428 






17 


2556 


10479 


243.92 


95 


278 


1279 


119 


338 






12 


2026 


7923 


255.71 


96 


198 


1050 


79 


261 






7 


1594 


5897 


270.31 


97 


126 


845 


48 


195 






5 


1219 


4303 


283.29 
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Age 
98 
99 

100 


I 

85 
70 
35 


II 

672 
525 
401 


III 
26 

9 


IV 

140 

96 

59 

29 

4 


Va 


Vb 


VI dx 

2 925 

701 


ix 

3084 
2159 


lOOOqx 

299.94 
324.69 


101 


24 


298 








495 


1458 


339.51 


102 


19 


217 








351 


963 


364.48 


103 


14 


149 








240 


612 


392.16 


104 


10 


97 










163 


372 


488.17 


105 


8 


55 










107 


209 


511.96 


106 


6 


25 










63 


102 


727.65 


107 


3 


2 










37 


39 


794.87 


108 


2 












5 


S 


625.00 


109 


1 












1 


3 

1 


666.67 
1000.00 
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ADDENDA II 



In order to show a rapid application of frequency 
curve methods to the graduation of mortality tables 
when the number of lives exposed to risk at various 
ages is known, the following data, relating to appli- 
cants who had been rejected for life assurance on 
account of impaired health, by Scandinavian assur- 
ance companies is instructive. The original stati- 
stics as collected by a committee of the insurance 
companies were first published in the quinquennial 
report (1910—1915) of the Danish Government life 
Assurance Institution (The Statsanstalt) for 1917. 

The material related to Scandinavian and Finnish 
applicants who previously to 1893 (and in the case 
of two Danish companies before 1899) had been re- 
jected for life assurance. By a special investigation, 
the committee followed up these rejections and sought 
to establish whether the applicants were alive at July 
1, 1899, or were previously deceased. Detailed re- 
ports for the full period during which the risks were 
under observation were available for 8,208 individual 
applicants. For 2,023 applicants complete data were 
not available. 

The final statistical results of the Statsanstalt's in- 
vestigation are shown in the following summary 
table: 
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TABLE I. 



Mortuary Experience of Rejected Risks of 


navian 


Life Companies. 


Attained 


No. Exposed Number 


Age 


to Risk of Deaths 


15-19 


434 6 


20-24 


3,831 28 


25-29 


11,405 145 


30-34 


17,644 233 


35-39 


19,442 318 


40-44 


17,600 324 


45-49 


13,971 296 


50-54 


10,179 295 


55-59 


6,640 264 


60-64 


3,927 194 


65-69 


1,995 96 


70-74 


836 71 


75-79 


306 32 


80-84 


98 20 


85-89 


12 3 



The exposed to risk by separate ages and the 
correlated deaths are shown in Table II in Columns 
2 and 3, from which we, without difficult}', obtain the 
crude or ungraduated mortality rates, as shown 
Column 4. 

We next assume a purely hypothetical frequency 
distribution of the exposed to risk, according to age, 
represented by a Laplacean normal probability curve 
with its mean or origin at age fifty and a dispersion 
equal to 12.5 years, as shown in Column 5. The fre- 
quency distribution of the number of deaths on the 
basis of the ungraduated mortality rates in Column 4 
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and the above-mentioned normal probability curve is 
shown in Column 6, which may be considered as an 
ungraduated compound frequency curve. * 

Arranged in quinquennial age intervals this latter 
frequency distribution is shown in the following sum- 
man,- table: 



Ages 


No. of Deaths 


13-17 


51 


18-22 


75 


23-27 


329 


28-32 


711 


33-37 


1,464 


38-42 


2,498 


43-47 


3,649 


48-52 


5,377 


53-57 


6,238 


58-62 


6,232 


63-67 


5,254 


68-72 


3,605 1 


73-77 


2,536 


78-82 


1,425 


83-87 


1,169 


88-92 


351 


93 or over 


95 



Total . . . 41,059 



The above frequency distribution is now subjected 
to a graduation by means of the Laplacean — Charlier 
or Gram — Charlier frequency function. The mathe- 
matical calculations give the following parameters: 



1 A slight adjustment was made in the figures in column (6) corres- 
ponding to age 70, and in the age groups above the age oi 88. 
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Mean Age 57.75 years 

Dispersion 13.32 years 

Skewness —0.0031 

Excess —0.0037 

Applying these parameters to standard probability 
tables we obtain the usual Laplacean — Charlier fre- 
quency curve. Distributing the 41,059 individual 
deaths according to this frequency curve we obtain 
column (7) which is the graduated death curve cor- 
responding to the hypothetical exposure as- given by 
column (5). The final mortality rates per 1,000 of 
exposed to risk are then found by dividing (7) with 
(5) and are shown in column (8). 

In order to show how close the graduation by 
means of frequency curves agrees with the actual 
observations, I have made a calculation of the 
" actual" to the " expected" deaths by quinquennial 
age intervals as shown in the following table: 

TABLE III. 

Comparison between "Actual" and "Expected" 

Deaths on the Basis of the Graduated Mortality 

Rates of the Scandinavian Mortality Table for 

Rejected Lives 

No. Exposed 
A S es to Kisk 

15-19 434 

20-24 3,831 

25-29 11,4-05 

30-34 17,644 

35-39 19,442 

40-44 17,600 



Actual 


Expectec 


Deaths 


Deaths 


6 


3.4 


28 


37.6 


145 


133.4 


233 


242.2 


318 


314.3 


324 


336.8 
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Ages 


No. Exposed Actual 
to Risk Deaths 


Expected 
Deaths 


45-49 


13,971 296 


321.8 


50-54 


10,179 295 


287.2 


55-59 


6,640 264 


234.8 


60-64 


3,927 194 


178.6 


65-69 


1,995 96 


119.5 


70-74 


836 71 


67.4 


75-79 


306 32 


33.8 


80-84 


98 20 


15.1 


85-89 


12 3 


2.5 



Total 108,320 2,325 2,328.4 

Considering the somewhat meager experience on 
which the graduation was based, I think it must be 
admitted that the method of frequency curves comes 
surprisingly close to the actual facts. In this connec- 
tion it is of interest to note that the actuaries of the 
Danish Statsanstalt made a graduation of the above 
data on the basis of Makeham's method and obtained 
from least square methods the following values for 
the constants. x 

A = 0.006 

log B = 7.0566 — 10 

log C = 0.025 

The " expected" deaths according to this latter 
graduation, and on the basis of the above experience, 
amount in total to 2,317 as against 2,325 "actual" 
deaths and 2,328 " expected" deaths according to the 
frequency curve method. "Viewed from the stand- 



1 See formula (6) page 192 of Institute of Actuaries Text Book. Life 
Contingencies by E. E. Spurgeon, London, 1922. 
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point of the principle of least squares it is also found 
that the sum of the squares of the deviations is smal- 
ler under the frequency curve method than under the 
method of Makeham, which seems to be pretty good 
evidence of the soundness of the method in spite of 
the fact that I throughout have worked with un- 
weighted observations. If properly chosen weights 
were applied to the observations even closer results 
could be obtained. 



TABLE II. 

Mortality Experience of Rejected Scandinavian Risks 
(Male). 











(5) 


(6) 


(7) 
G t d, du&t 6 d 




f\ \ 


(2) 


(3) 


(3) : (2) 


Hypo- 


(5) X (4) 


(8) 


(1) 


Exposed 


No. ol 


thetical 


Crude 


Death 


(7) : (5) 


ige 


to Risk 


Deaths 


Expo- 


Death 


Curve 


lOOOqx 










sure 


Curve 






15 


11 





0.00000 


792 





5.6 


7.07 


16 


31 


1 


0.03226 


987 


32 


7.1 


7.07 


17 


64 


1 


0.01562 


1223 


19 


9.2 


7.52 


18 


121 





0.00000 


1506 





11.7 


7.77 


19 


207 


4 


0.01932 


1842 


3 


15.4 


8.36 


20 


340 


1 


0.00294 


2239 


7 


19.7 


8.80 


21 


501 


1 


0.00200 


2705 


5 


25.0 


9.24 


22 


719 


6 


0.00834 


3246 


27 


30.8 


9.49 


23 


982 


6 


0.00611 


3871 


24 


38.8 


10.02 


24 


1289 


14 


0.01086 


4586 


50 


47.8 


10.42 


25 


1619 


22 


0.01359 


5399 


73 


58.2 


10.78 


26 


1986 


23 


0.01158 


6316 


73 


70.6 


11.18 


27 


2287 


34 


0.01487 


7341 


109 


85.0 


11.58 


28 


2597 


29 


0.01117 


8478 


95 


101.7 


12.00 


29 


2916 


37 


0.01269 


9728 


123 


120.5 


12.39 


30 


3180 


38 


0.01195 


11092 


133 


142.0 


12.80 


31 


3395 


50 


0.01473 


12566 


185 


166.4 


13.24 


32 


3564 


44 


0.01235 


14146 


175 


193.5 


13.68 


33 


3700 


46 


0.01243 


15822 


197 


223.4 


14.12 


34 


3806 


55 


0.01445 


17585 


254 


257.0 


14.61 


35 


3882 


48 


0.01236 


19419 


240 


293.3 


15.10 


36 


3943 


64 


0.01623 


21307 


346 


332.8 


15.62 


37 


3921 


72 


0.01836 


23230 


427 


375.3 


16.16 


38 


3880 


66 


0.01701 


25164 


428 


420.0 


16.69 


39 


3816 


68 


0.01782 


27086 


483 


467.7 


17.27 


40 


3737 


66 


0.01766 


28969 


512 


517.6 


17.87 


41 


3637 


63 


0.01732 


30785 


533 


566.9 


18.41 
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(i) 

Age 


(2)i 
Exposed 


(3) 
No. of 


(*) 

(3) : (2) 


(5) 
Hypo- 
thetical 


(6) 
(5) * (4) 
Crude 


(7) 

Graduated 

Death 

Curve 


(8) 
(7) : (5) 


to Risk 


Deaths 


Expo- 


Death 


1000 qx 










sure 


Curve 




42 


3539 


59 


0.01667 


32506 


542 


623.3 


19.17 


43 


3426 


62 


0.01810 


34105 


617 


678.2 


19.89 


44 


3261 


74 


0.02269 


35553 


807 


732.7 


20.61 


45 


3079 


67 


0.02176 


36827 


801 


787.8 


21.39 


46 


2941 


61 


0.02074 


37903 


786 


842.4 


22.23 


47 


2793 


46 


0.01647 


38762 


638 


895.1 


22.97 


48 


2653 


61 


0.02299 


39387 


906 


945.9 


24.02 


49 


2505 


61 


0.02435 


39767 


968 


994.3 


25.00 


50 


2348 


61 


0.02598 


39894 


1036 


1039.0 


26.04 


51 


2184 


65 


0.02976 


39767 


1183 


1079.9 


27.16 


52 


2024 


66 


0.03261 


39387 


1284 


1116.0 


28.33 


53 


1882 


59 


0.03135 


38762 


1215 


1147.4 


29.53 


54 


1741 


44 


0.02527 


37903 


958 


1173.3 


30.96 


55 


1610 


62 


0.03851 


36827 


1418 


1193.0 


32.39 


56 


1447 


60 


0.04147 


35553 


1474, 


1206.9 


33.95 


57 


1308 


45 


0.03440 


34105 


1173 


1214.3 


35.60 


68 


1189 


47 


0.03953 


32506 


1285 


1214.9 


37.37 


59 


1086 


50 


0.04604 


30785 


1417 


1209.0 


39.27 


60 


966 


44 


0.04555 


28969 


1320 


1197.0 


41.32 


61 


871 


35 


0.04019 


27186 


1089 


1178.8 


43.52 


62 


786 


35 


0.04453 


25164 


1121 


1154.2 


45.87 


63 


701 


44 


0.06277 


23230 


1458 


1124.6 


48.41 


64 


603 


36 


0.05970 


21307 


1272 


1090.1 


51.16 


65 


518 


22 


0.04247 


19419 


825 


1050.7 


54.11 


66 


453 


24 


0.05298 


17585 


932 


1006.3 


57.22 


67 


392 


19 


0.04847 


15822 


767 


960.1 


60.68 


68 


340 


16 


0.04706 


14146 


666 


909.6 


64.30 


69 


291 


15 


0.05155 


12566 


648 


858.4 


68.31 


70 


244 


25 


0.10246 


11092 


1136 


804.2 


72.50 


71 


193 


17 


0.08808 


9728 


857 


750.9 


77.19 


72 


158 


13 


0.08228 


8478 


698 


695.7 


82.06 


73 


132 


9 


0.06818 


7341 


501 


642.4 


87.51 


74 


109 


7 


0.06422 


6316 


406 


589.1 


93.27 


75 


91 


8 


0.08791 


5399 


475 


537.7 


99.59 


76 


74 


10 


0.13514 


4586 


620 


486.8 


106.15 


77 


58 


8 


0.13793 


3871 


534 


440.3 


113.74 


78 


45 


4 


0.08889 


3246 


289 


393.8 


121.32 


79 


37 


2 


0.05405 


2705 


146 


351.9 


130.09 


80 


31 


5 


0.16129 


2239 


361 


311.8 


139.26 


81 


24 


6 


0.25000 


1842 


461 


274.5 


149.02 


82 


18 


2 


0.11112 


1506 


168 


241.6 


160.42 


83 


15 


4 


0.26667 


1223 


326 


209.5 


171.30 


84 


9 


3 


0.33334 


987 


329 


181.5 


183.89 


85 


6 


2 


0.33334 


792 


264 


155.9 


196.84 


86 


3 





0.00000 


631 


000 


133.4 


211.41 


87 


2 


1 


0.50000 


499 


250 


113.4 


227.26 


88 


2 


1 


0.50000 


393 


197 


95.5 


243.00 


89 


0.5 





0.50000 


307 


154 


79.2 


257.98 



Note: — The observations above age 87 are not reliable. 
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